Nested Aggregation (1 Viewer)

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
Hi all,

Sorry for the lack of a descriptive title.

I'm trying to associate revision levels of SOPs with historical training records (ie, someone trained on SOP-001 on 1/1/2020 was trained to Revision 3, because Revision 3 is dated 12/31/2020, and Revision 4 is dated 1/2/2020).

So I need to use an aggregate query of some sort to select all the revisions of each SOP listed in our "Master List of Controlled Documents" table, (eg, GROUP BY SOP) but with a compound criteria: maximum revision date less than or equal to the training date in the training table.

How would one accomplish this?

Thanks in advance for any insight you kind folks can offer.
 

plog

Banishment Pending
Local time
Yesterday, 18:35
Joined
May 11, 2011
Messages
11,646
The best way to communicate data needs is with sample data. Could you provide 2 sets of data--

A. Starting data from your table(s). Include table and field names and enough data to cover all cases.

B. Expected results of A. Show what data you expect the final query to produce when you feed it the data from A.

Again, 2 sets of data--starting and ending.
 

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
tblMasterListControlledDocs

ctrldocIDDocument NumberRevisionDocument/Revision DateDocument Description
5311​
SOP-01107
11/12/2020​
SOP Packaging and Assembly
5307​
SOP-01106
11/2/2020​
SOP Packaging and Assembly
5305​
SOP-01105
8/28/2020​
SOP Packaging and Assembly
5206​
SOP-01104
7/15/2020​
SOP Packaging and Assembly
5180​
SOP-01103
6/19/2020​
SOP Packaging and Assembly
5157​
SOP-01102
5/26/2020​
SOP Packaging and Assembly
4981​
SOP-01101
10/25/2019​
SOP Packaging and Assembly

tblSOP tblSOP

bpidbpnumberbpname
111​
SOP-011Packaging and Assembly

tblTrainingRecords tblTrainingRecords

AutoIDEmployeeBP/Job NumbertrbpDatetrbpNotestrbpTrainer
8530​
Joe SchmoSOP-011
11/20/2020​
EXAMPLEBob
8529​
Joe SchmoSOP-011
11/5/2020​
EXAMPLEJim
8528​
Joe SchmoSOP-011
9/1/2020​
EXAMPLEJose
8527​
Joe SchmoSOP-011
8/1/2020​
EXAMPLEJedediah
8531​
Joe SchmoSOP-011
7/15/2020​
EXAMPLEJeff
8526​
Joe SchmoSOP-011
7/1/2020​
EXAMPLEJeff
8525​
Joe SchmoSOP-011
6/1/2020​
EXAMPLEJimbo
8524​
Joe SchmoSOP-011
10/27/2019​
EXAMPLEJake

The BP/JobNumber field in TrainingRecords is actually a lookup--it's querying table SOP to pull the right document number in.

Similarly, the Employee and Trainer fields are actually lookups to our Employees table.

Unfortunately, part of the headache is that the database isn't set up properly normalized; eg, the SOP information is stored twice. The DocumentID in tblMasterListControlledDocs is not the same as the bpID in tblSOP, which is what is used for the SOP identifier in tblTrainingRecords.

Instead, any joins to pull in columns from tblMasterListControlledDocs are done on the Document Number field. Every revision gets its own record, so that gives a one:many relationship.


The idea is to take those three tables, and join in the crtlDocRev that's appropriate for each training entry:

Desired Output

AutoIDEmployeeBP/Job NumberRevisiontrbpDatetrbpNotestrbpTrainer
8530​
Joe SchmoSOP-0117
11/20/2020​
EXAMPLEBob
8529​
Joe SchmoSOP-0116
11/5/2020​
EXAMPLEJim
8528​
Joe SchmoSOP-0115
9/1/2020​
EXAMPLEJose
8527​
Joe SchmoSOP-0114
8/1/2020​
EXAMPLEJedediah
8531​
Joe SchmoSOP-0114
7/15/2020​
EXAMPLEJeff
8526​
Joe SchmoSOP-0113
7/1/2020​
EXAMPLEJeff
8525​
Joe SchmoSOP-0112
6/1/2020​
EXAMPLEJImbo
8524​
Joe SchmoSOP-0111
10/27/2019​
EXAMPLEJake

Is that relatively clear?
 

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
I should add that I'm intentionally limiting this to one employee and one SOP; but we have a couple hundred employees and several dozen SOPs.

I figure I should be able to start with that and generalize to all employees and all SOPs afterward. Though I'm happy to take other suggestions...
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 19:35
Joined
May 21, 2018
Messages
8,527
You can do with a sub query. Something like

tblDocs tblDocs

IDDocumentNumber_FKRevisionNoRevisionDate
1​
SOP-111
1/1/2017​
2​
SOP-112
1/1/2018​
3​
SOP-113
1/1/2019​
4​
SOP-114
1/1/2020​
5​
SOP-115
1/1/2021​
tblTraining tblTraining

IDemployeeID_FKDocumentNumber_FKTrainingDate
1​
1​
SOP-11
3/1/2017​
2​
1​
SOP-11
3/1/2018​
3​
1​
SOP-11
3/1/2019​
4​
1​
SOP-11
3/1/2020​
5​
1​
SOP-11
3/1/2021​
Code:
SELECT tbltraining.documentnumber_fk,
       tbltraining.trainingdate,
       tbltraining.documentnumber_fk,
       (SELECT TOP 1 revisionno
        FROM   tbldocs
        WHERE  tbltraining.trainingdate > revisiondate
               AND tbltraining.documentnumber_fk = tbldocs.documentnumber_fk
        ORDER  BY revisiondate DESC) AS TrainedOn
FROM   tbltraining;

Query1 Query1

Expr1000TrainingDateDocumentNumber_FKTrainedOn
SOP-11
3/1/2017​
SOP-111
SOP-11
3/1/2018​
SOP-112
SOP-11
3/1/2019​
SOP-113
SOP-11
3/1/2020​
SOP-114
SOP-11
3/1/2021​
SOP-115
 

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
Oh, that's much smarter! I was looking at a bunch of joins (ie, every training date with every revision date) and then trying to select only the correct revision dates, but this is a MUCH better approach.

I'll leave this as open for the moment while I put this into implementation, but I think we have a solution. Much appreciate it, I don't know why I was struggling in the first place. Every time I think I'm getting mildly proficient at SQL and DBA stuff generally I fall on my face. Y'all pick me up every time though... :)
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 19:35
Joined
May 21, 2018
Messages
8,527
I'll leave this as open for the moment while I put this into implementation, but I think we have a solution.
I would leave it open. Subqueries are inefficient and someone usually comes by with a query without the subquery. Now that I say that hold on.
 

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
Fair enough... One thing I love about SQL is that there's always an alternative approach if you want to get clever enough...
 

jumnhy

Member
Local time
Yesterday, 19:35
Joined
Feb 25, 2021
Messages
68
Following up here to bump and request further assistance--

My final version of this is slightly more convoluted than @MajP (due to the lookups) but appears as follows:
SQL:
SELECT Tr.trbpnumber,
       Tr.trbpEmp,
       Tr.trbpdate,
       Tr.trbpTrainer,
       Tr.trbpNotes,
       (SELECT TOP 1 CD.crtlDocRev
        FROM tblMasterListControlledDocs AS CD
        LEFT JOIN tblSOP AS S ON CD.[ctrlDocDoc#] = S.[bpnumber]
        WHERE S.bpid = Tr.trbpnumber
                 AND Tr.trbpdate >= CD.ctrlDocDate
        ORDER BY ctrlDocDate DESC) AS Revision
FROM tblTrainingRecords AS Tr;


This *works* in the strictest sense, but has some obvious limitations. You can't filter on the revision date--Access throws the understandable (sort of) error "This query can return at most 1 results" because, yeah, obviously, the subquery is returning 1 result... Per row.

Additionally, it's just slow as hell. Again, understandably.

Any thoughts on how to tune this for performance?

At some point, I'll need to be able to select only the most recent training for each employee on every SOP, so I'm looking to tweak this to make sure it'll work in that context too.


Edit:

I should add for posterity--MS ACCESS will actually throw a hissy fit if the subquery (what's pulling our "TOP 1" record or most recent revision) has a potential "tie" in the order by clause of the select statement. In my case, there was a duplicate entry for an SOP revision--someone entered one revision twice.

You can fix this by adding a unique id to the order by clause of your subquery, something that will break the tie. I just added the DocumentID field from our table of controlled documents. Much better now.

See the venerable Allen Brown on this here:

Still eager to see if there's a more efficient way of accomplishing any of this, though...
 
Last edited:

Users who are viewing this thread

Top Bottom