Linking and Data from a single Table (1 Viewer)

Chris Hinds

New member
Local time
Today, 14:00
Joined
Sep 21, 2012
Messages
4
Hi,

First post but reasonably familiar with Access, nest queries and relational data including on SQL Server.

I have a number of data extracts from an SQL server that export to CSV for use in a tactical reporting solution being used to define requirements for the strategic solution. Due to firewall and other restrictions I cannot query the SQL Server directly for data hence the use of CSV files to interchange into Access. For 95% of the data I have it all relating nicely and working, however one step is giving me heartache. I need to calculate the elapsed time spent doing a particular task by a particular person. Below is a key extract of data from the input CSV Data. Apologies for posting as Text but I can't post image with a 0 post count. Attachment Data.jpg contains a more detailed extract of what the log looks like including the missing entries from the text summary

Log_pk Job_pk Person_pk Action Time
1 a x Start 21-09-2012 19:30:07
4 a x Pause 21-09-2012 19:33:14
7 a x Resume 21-09-2012 19:35:00
9 a x End 21-09-2012 19:35:02

I need to work out the time spent by each person on each job.


Basic system rules:
  1. Log_pk is always assigned off the clock
  2. More than one person can be on a task
  3. A Start record can be followed by Pause or End
  4. A Resume record can be followed by Pause or End
  5. A person can Pause/Resume an unlimited number of times on a job
  6. If a person is Paused the next time they touch the job they will Resume but if a person has Ended the next time they touch the job it will record Start
  7. A person does not have to have an End record on a job but any Start or Resume records without completion of some kind (Pause or End) should be discarded from calculations
  8. Any other records in the log should also be discarded (other actions are also tracked in the log but these four are the only actions of concern)
By linking the data table back to itself I have gotten a query that:
  1. Groups by Job_pk
  2. Then groups by Person_pk
  3. Returns a Start time (entries that are Like Start or Like Resume)
  4. Returns an End time (entries that are Like Pause or Like End)
  5. Calculates the time for that particular row
This works absolutely correctly when there is a single Start/End period for a person on a job (approx. 130,000 of the 135,000 occurrences in my sample data are like this)

There are a number of occurrences where more than one instance is present, for example Person A on Job X in the image has a Start, Pause, Resume and End.

Here my query returns four rows of results:

J_pk P_pk Start End Time
a x 21-09-2012 19:30:07 21-09-2012 19:30:07 187
a x 21-09-2012 19:30:07 21-09-2012 19:35:02 295
a x 21-09-2012 19:35:00 21-09-2012 19:33:14 -106
a x 21-09-2012 19:35:00 21-09-2012 19:35:02 2

Here, rows 1 and 4 are "good" data i.e. Start to Pause and Resume to End whilst rows 2 and 3 are "bad" results. I know that I can find the row numbers of the good results in my output by:

Code:
(n*i)-n+i
Where n is the total number of times someone has worked on a job and i is the number of entries I expect to find from 1 to n e.g for a job with 3 periods of work by someone I will get returned 9 rows of data with good data in rows 1, 5 and 9. I realise I could eliminate negative results in the query but that would also invalidate the equation above - a problem for how I'm currently trying to do things, but maybe not with a different way.

I've tried writing other queries off the data to derive how many pairs of data I have and can successfully derive this answer. How can I use that information to write a True/False field against each row for inclusion in the final calculation?

Alternatively am I barking totally up the wrong tree and the answer is so simple that I'm over thinking it.

Any help will be rewarded with warm thanks, virtual beer and general appreciation.

Many thanks

Chris

P.S. Data and Field names have been changed to protect the guilty.
 

Attachments

  • Data.jpg
    Data.jpg
    83.9 KB · Views: 130
  • Output.jpg
    Output.jpg
    30.6 KB · Views: 150

plog

Banishment Pending
Local time
Today, 08:00
Joined
May 11, 2011
Messages
11,665
I can do it in 2 queries. This is a sub-query that limits your data set to the appropriate actions (Start, Pause, Resume, End):

Code:
SELECT YourTableNameHere.Log_pk, YourTableNameHere.Job_pk, YourTableNameHere.Person_pk, YourTableNameHere.Action, YourTableNameHere.TimeField
FROM YourTableNameHere
WHERE (((YourTableNameHere.Action)="Start" Or (YourTableNameHere.Action)="End" Or (YourTableNameHere.Action)="Pause" Or (YourTableNameHere.Action)="Resume"));

Replace all instances of 'YourTableNameHere' with the name of your actual table. Save this query as sub_TimeQuery. Next, use this SQL for the actual query to produce your results:

Code:
SELECT sub_TimeQuery.Job_pk, sub_TimeQuery.Person_pk, Sum(DateDiff("s",IIf([Action]="Start",[TimeField],DMax("[TimeField]","sub_TimeQuery","[Job_pk]='" & [Job_pk] & "' AND [Person_pk]='" & [Person_pk] & "' AND [TimeField]<#" & [TimeField] & "#")),[TimeField])) AS ElapsedSeconds
FROM sub_TimeQuery
WHERE (((sub_TimeQuery.Action)="Start" Or (sub_TimeQuery.Action)="End" Or (sub_TimeQuery.Action)="Pause"))
GROUP BY sub_TimeQuery.Job_pk, sub_TimeQuery.Person_pk
HAVING (((Sum(IIf([Action]="End",1,0)))>0));

This determines the elapsed time between events in the sub_query and adds all that elapsed time up for all Start, Pause and End actions. Resume action is excluded because the time between a Pause and a Resume shouldn't be included. The last field of it determines if there is an 'End' action in the sub-query, if so it shows result, if not it is excluded. I think I incorporated all your criteria, let me know if I didn't.


Using this method on the sample data you provided I got 189 seconds for Person x on Job a.
 
Last edited:

Chris Hinds

New member
Local time
Today, 14:00
Joined
Sep 21, 2012
Messages
4
Hi Plog,

Thanks for your help - that's an amazing improvement on how complex I was trying to make it. My sub-query is running just fine and giving me the results I'd expect but when I run the main query I am getting a "Data type mismatch in criteria expression" error...

The only significant different between my sample data and the real data is that the Job_pk in the real data is not necessarily a numeric, for some jobs it's a numeric, for others it's a string of form Entry_{(<big numeric>)} - would this cause any issues in the DMax statement?

Thanks again

Chris
 

Chris Hinds

New member
Local time
Today, 14:00
Joined
Sep 21, 2012
Messages
4
Just realised the other thing that I screwed up in the original explanation... it might be the case that I have a number of records like this:

Job_pk Person_pk Action Time
a 29 Start 11/11/2011 12:25:11
a 29 Pause 11/11/2011 12:44:42
a 29 Resume 11/11/2011 12:48:12
a 29 Pause 11/11/2011 12:54:56

Even though I have no "End" time for the job, a contribution of 26 mins 15 secs has been made and I would need to include such numbers in the calculation of work carried out. Apologies for not being clearer in the first place.

Kind regards

Chris
 

plog

Banishment Pending
Local time
Today, 08:00
Joined
May 11, 2011
Messages
11,665
For Job_pk field I used text, so as long its some sort of text field, that is not it. In the table it comes from what is its data type exactly?

Also, I don't understand about those jobs without an 'End' time. Do you mean to say as long as their last action was either a 'Pause' or an 'End' they should be included?
 

Chris Hinds

New member
Local time
Today, 14:00
Joined
Sep 21, 2012
Messages
4
Hi plog,

Thanks for replying again. The fields are as follows:

Log_pk = Number
Person_pk = Number
Job_pk = Text
Time = Date/Time
Action = Text

The data is in a linked table sourced from the CSV file. I checked with other queries and all the dates are valid and in the correct format for Access to interpret.

In answer to the second question, yes, anything with the last action being Pause or End should be included.

Thanks again

Chris
 

plog

Banishment Pending
Local time
Today, 08:00
Joined
May 11, 2011
Messages
11,665
These queries should fix both those issues:

sub_TimeQuery

Code:
SELECT YourTableNameHere.Log_pk, YourTableNameHere.Job_pk, YourTableNameHere.Person_pk, YourTableNameHere.Action, YourTableNameHere.TimeField
FROM YourTableNameHere
WHERE (((YourTableNameHere.Action)="Start" Or (YourTableNameHere.Action)="End" Or (YourTableNameHere.Action)="Pause" Or (YourTableNameHere.Action)="Resume"));

main query:

Code:
SELECT sub_TimeQuery.Job_pk, sub_TimeQuery.Person_pk, Sum(DateDiff("s",IIf([Action]="Start",[TimeField],DMax("[TimeField]","sub_TimeQuery","[Job_pk]='" & [Job_pk] & "' AND [Person_pk]=" & [Person_pk] & " AND [TimeField]<#" & [TimeField] & "#")),[TimeField])) AS ElapsedSeconds
FROM sub_TimeQuery
WHERE (((sub_TimeQuery.Action)="Start" Or (sub_TimeQuery.Action)="End" Or (sub_TimeQuery.Action)="Pause") AND ((DLookUp("[Action]","sub_TimeQuery","[TimeField]=#" & DMax("[TimeField]","sub_TimeQuery","[Job_pk]='" & [Job_pk] & "' AND [Person_pk]=" & [Person_pk] & "") & "#"))="Pause" Or (DLookUp("[Action]","sub_TimeQuery","[TimeField]=#" & DMax("[TimeField]","sub_TimeQuery","[Job_pk]='" & [Job_pk] & "' AND [Person_pk]=" & [Person_pk] & "") & "#"))="End"))
GROUP BY sub_TimeQuery.Job_pk, sub_TimeQuery.Person_pk;
 

Users who are viewing this thread

Top Bottom