Wow! Thanks for taking the time to help me, I really appreciate it. Ooohhhh! This might be it!! Thanks so much. I'll give it a go on the real data set in the morning but I think you have nailed it for me and saved me days of work. Thank you !!!
I did not complete your cases, but it is a real simple cut and paste to add more categories and then update the union query. This will be instantaneous for the most part. You also may need to add a calculated field to this so that the user sort properly.
I did not complete your cases, but it is a real simple cut and paste to add more categories and then update the union query. This will be instantaneous for the most part. You also may need to add a calculated field to this so that the user sort properly.
Big thank you to everyone for all the help, time and willings to share your expertise to a mere mortal. Greatly appreciated. Virtual Beer all round . Thanks again. S
I am sorry ...... I am sure you thought you got rid of me , can I use your expertise again please.
As I now have everying in one query thanks to you guys.
Could I ask you to show me, if its possible. To create the "Average Time Between Negitive press", for each section and user. I can copy and past for Positive Press. So as I have the total KP -* already. I now need to use the Timestamp Field from the individual query USER1 with SECTION 1 by taking the Last Timestamp from the First Timestamp and Dividing it by KP -* again for all 700 USERS and 3 Sections and create a new total Average Time between Negitive Presses per Section.
This was the easier option of the time analysis as originally "Average Key Press", was to be analysis but this had to be done by counting KP - and KP + pairs except the data is not even ie there is an uneven amount of -'s than +'s and some weird random KP.
One thing all datetime fields have both a date and a time, even if one or the other is zero. This can present a big problem to you. Does all data happen on the same day and never goes over 12:00 AM? If not this is not doable in your current data. Here is your data unformatted.
qryKPminusTime qryKPminusTime
Time
2/15/1900 7:12:04 PM
3/2/1900 9:35:52 AM
3/2/1900 9:36:13 AM
3/2/1900 9:36:34 AM
3/18/1900
3/18/1900
3/18/1900
3/18/1900 12:00:21 AM
3/18/1900 12:00:21 AM
4/2/1900 2:23:47 PM
4/2/1900 2:24:08 PM
So in the above case 9:35 Am is greater that 7:12M. You can strip all the date information away if it makes sense and that is only if all data comes from the same day.
By first look you have significant issues with the data, and until it is fixed then not sure it is useable. The data cannot be sorted by time as shown so the hope is that it can be sorted by the way it was entered. That could be by sorting by the autonumber.
I sorted by User then by ID. So the events should be in order that they occurred but they are not. This is probably because how it was imported. Assume you stripped the date part away.
qryKPminusTime
userName
Time
User 155
1/15/1900 2:24:08 PM
User 155
12:00:00 AM
User 155
2/15/1900 7:12:04 PM
User 155
7/20/1900 7:12:25 PM
User 155
2/15/1900 7:12:25 PM
User 155
1/15/1900 2:23:47 PM
User 155
1/15/1900 2:24:30 PM
User 155
3/18/1900
User 155
3/18/1900
User 155
4/18/1900 4:48:17 AM
User 155
4/18/1900 4:48:17 AM
So as far as I can tell there is no way to determine the order of the time. Unless you say all of these events happened on the same day. In that case you can strip the date part and sort by time. If these can roll past midnight, then there is no possible solution without updated data.
Thanks again @MajP for all the help. I might have messed up the time a little creating the sample data set. The log is from a few years back and from several organisations but Time is represented in using UNIX Timestamp. I have attached a single instance ie USER 1050 for Question 1 for all Keypress. Is this Date/Time format useable?
So you would need the elapsed time between each KP -. Then the average of the elapsed time?
Sample Q1 KP Sample Q1 KP
ID
userName
questionNumber
timeStamp
eventDetail
1104911
1050
1
1450252356377.83
K: -73
1104912
1050
1
1450252356409.03
K: +73
1104913
1050
1
1450252356767.83
K: -8
So for record 1104913 the elapsed time is 1450252356767.83 - 1450252356377.83?
Do you need Avg Elapsed Negative Key Press Duration for each user or per section? Does not really matter if you can do one you can do both?
So you would need the elapsed time between each KP -. Then the average of the elapsed time?
Sample Q1 KP Sample Q1 KP
ID
userName
questionNumber
timeStamp
eventDetail
1104911
1050
1
1450252356377.83
K: -73
1104912
1050
1
1450252356409.03
K: +73
1104913
1050
1
1450252356767.83
K: -8
So for record 1104913 the elapsed time is 1450252356767.83 - 1450252356377.83?
Do you need Avg Elapsed Negative Key Press Duration for each user or per section? Does not really matter if you can do one you can do both?
Thanks again. I am looking for Each user for each section. Their average time on each section. As I am trying to see if time is a factor, and will help improve accuracy on a machine learning alogorithm. I think this is the only way I can use or analysis time, within the database.
See if this is what you mean. I added two more UserNames.
Here is the elapsed query between K- events
Code:
SELECT CURRENT.id,
CURRENT.username,
CURRENT.timestamp,
CURRENT.eventdetail,
(SELECT Max(B.timestamp)
FROM [qrykminus] AS B
WHERE CURRENT.username = B.username
AND B.timestamp < CURRENT.timestamp) AS Previous,
[current].[timestamp] - [previous] AS Elapsed
FROM qrykminus AS [Current]
ORDER BY CURRENT.username,
CURRENT.timestamp;
qryElapsed
qryElapsed qryElapsed
ID
userName
timeStamp
eventDetail
Previous
Elapsed
1104896
1050
1450252354069.02
K: -80
1104898
1050
1450252354459.02
K: -82
1450252354069.02
390.000732421875
1104899
1050
1450252354552.62
K: -73
1450252354459.02
93.60009765625
1104905
1050
1450252355332.63
K: -78
1450252354552.62
780.00146484375
1104907
1050
1450252355722.63
K: -84
1450252355332.63
390.000732421875
1104909
1050
1450252356159.43
K: -76
1450252355722.63
436.80078125
.....
Then just take the AVG by USERName
Code:
SELECT qryElapsed.userName, Avg(qryElapsed.Elapsed) AS AvgOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName;
Thanks again, I would be pulling my hair out with your help. So i think you may given me an additional factor (Elapesed Time per KP), which is super cool, but just to double check my logic of what you have done for me:
"Elapsed Time" is the time eslased between each keypress for USER 1050 for Section 1
Then the code uses an average Function to Sum the elapsed times and divide by 51 (number of records KP -*)
Giving USER 1050 average time between presses for question 1 of 1727.23503417969
My thought was taking LASTTIMESTAMP (1450252440430.77) - FIRSTTIMESTAMP (1450252354069.02)/51 which gives
1693.37
Give a variation of 33.87
As I write this your calculation gives a more comprehensive answer . Thanks again @MajP, I seriously owe you more than virtual pints
Can these queries be added to Union Query like before?
My query only includes K- key strokes not all key strokes. I believe that is what you said. If it is all key strokes then modify the original query to include all keystrokes.
To create the "Average Time Between Negitive press", for each section and user.
In my opinion taking the aggregate is meaningless information. If I take elapsed times I can find things like the median, mode, variance, and distribution of the elapsed times. These things are usually far from normally distributed. Likely you have lots of short events and a few outlier long term events. A distribution of the data is far more telling of what is happening.
It sure does and I really appreciate the time, the patience and the ability to break it into understandable chucks for me. Virtual pints
K- is perfect, the way you shown is great as I understand it and can replicate it ... its perfect!!! I probabely will add K+ to the algorithm.
I'll get cracking on the full data set today @MajP I really cannot thank you enough, you have been a "Life Saver", as I was looking into a back hole, trying to get this done, the more I think of it you have saved me weeks not days of "hair pulling". If you ever get to my side of the pond, drop me a message so I can turn virtual pints into real .
W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.
www.w3schools.com
If you step through this your query knowledge will greatly improve. Very well structured and understandable.
W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.
www.w3schools.com
If you step through this your query knowledge will greatly improve. Very well structured and understandable.
If I take elapsed times I can find things like the median, mode, variance, and distribution of the elapsed times. These things are usually far from normally distributed. Likely you have lots of short events and a few outlier long term events. A distribution of the data is far more telling of what is happening.
Hi @MajP, I am sorry to bother you again, I been busy analysising away, thanks to all your help. I have even managed to adjust the avg elapsed by adjusting the queries slightly, which I would not have been able to do without your guidance, Thanks.
I have checked the access functions and can see by changing your SQL code from avg to MAX, MIN, STD and VAR. However, could you help with getting some of the other statistical functions as you suggested median, mode, variance, and distribution of the elapsed times, please.
I am heading out for the weekend, I can look at this later. Use the PM feature to remind me if I forget. However, the MIN, MAX, and STD should be very easy to do. If you look at qryAvg just replaces Avg with the other keywords
Code:
SELECT qryElapsed.userName, Avg(qryElapsed.Elapsed) AS AvgOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, Min(qryElapsed.Elapsed) AS MinOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, Max(qryElapsed.Elapsed) AS MaxOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, STD(qryElapsed.Elapsed) AS StandardDeviationOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
I have another function I can post later that does percentiles. This will be very beneficial for distributions. Or you drop this into excel and build your histograms there. Then you can show the graphical distribution of elapsed times.
But here is my point that AVG is meaningless. This is an ugly historgram because it is very hard to do a historgram when you have huge outliers. But this is the distribution for 1050 you have a few events below 100. A lot between 100 and 400 and huge outlier of 69K. This means IMO talking about the average is meaningless. Most users will never see anything near the average.
So throwing out the one outlier I get something that is reasonable.
So I can tell the boss the average is around 4K, but that is not what users would expect to see.
"Boss, this is because we had one bad case taking 69K. But as you can see the biggest group is at 100 with a cluster between 100 and 250. There is another cluster of events between 400 and 500 with only one extreme outlier over 900."
This is why I never believe any stats quoted in the news. It is very correct to say the Avg is 4k, it is also completely meaningless without further discussion. Stats may be correct, it is the interpretation that is not.
I am heading out for the weekend, I can look at this later. Use the PM feature to remind me if I forget. However, the MIN, MAX, and STD should be very easy to do. If you look at qryAvg just replaces Avg with the other keywords
Code:
SELECT qryElapsed.userName, Avg(qryElapsed.Elapsed) AS AvgOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, Min(qryElapsed.Elapsed) AS MinOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, Max(qryElapsed.Elapsed) AS MaxOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
SELECT qryElapsed.userName, STD(qryElapsed.Elapsed) AS StandardDeviationOfElapsed
FROM qryElapsed
GROUP BY qryElapsed.userName
I have another function I can post later that does percentiles. This will be very beneficial for distributions. Or you drop this into excel and build your histograms there. Then you can show the graphical distribution of elapsed times.
But here is my point that AVG is meaningless. This is an ugly historgram because it is very hard to do a historgram when you have huge outliers. But this is the distribution for 1050 you have a few events below 100. A lot between 100 and 400 and huge outlier of 69K. This means IMO talking about the average is meaningless. Most users will never see anything near the average. View attachment 94763
So throwing out the one outlier I get something that is reasonable. View attachment 94767
So I can tell the boss the average is around 4K, but that is not what users would expect to see.
"Boss, this is because we had one bad case taking 69K. But as you can see the biggest group is at 100 with a cluster between 100 and 250. There is another cluster of events between 400 and 500 with only one extreme outlier over 900."
This is why I never believe any stats quoted in the news. It is very correct to say the Avg is 4k, it is also completely meaningless without further discussion. Stats may be correct, it is the interpretation that is not.
Thanks a million @MajP, you are right I'll have deal with the outliers for the ML side of the analysis (more fun ), also going to run it through a DLN. I appreciate the time and the patience, and the great explanations. I am transfering most of the query analysis results to an excel sheet at the mo but the more worthwhile factors I can generate, the better.
Have a great weekend! and thanks again for the help, I pester you during the week, if posting the percentiles slips you mind