Finding gaps in a dataset based on timestamps and possibly infilling (1 Viewer)

Berlinomatt

New member
Local time
Today, 17:29
Joined
Aug 10, 2020
Messages
14
Hi,

I'm new to access but working with a relatively large dataset of approx. 1.5 million entries from weather station data. There are however gaps that I need to identify and in an ideal world run an update query to insert rows with the missing timestamps but with all data points set to blank or -9999. This is an example of what the data looks like currently:
1597065784678.png


The weather stations record every five minutes but like I said there were times when they failed and so I would need to identify and infill those timestamps.

Many thanks for the help!
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 09:29
Joined
Feb 28, 2001
Messages
27,150
Before doing that, my question is whether you need to do that at all.

Mathematically, when generating trend lines, you are typically dealing with something similar to a linear regression equation for which don't need to "fill in the blanks." In fact you would have to add a test to NOT use the infilled values. Most other statistical tests I know about will not care that there are gaps either. I'm not saying this can't be done, but it isn't necessarily a simple undertaking and will add records to your DB that have no major computational value.

Can you tell us WHY you need the infills? What useful thing does it tell you? What useful thing does it do for you?

Before you think I'm "blowing you off" here, look in the the "similar threads" list below this thread for articles on how others have handled this same (or similar) problem. You won't even have to do a search because the forum software can identify threads like yours. By the way, most of those other threads are looking at date gaps, not time gaps, but the principle is the same because date and time are stored in a date/time variable. Comparing only dates and comparing dates & times together will be practically identical. Your only problem will be roundoff issues since time involves an implied fraction (of a day) and binary fractions can be fractious.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 10:29
Joined
May 21, 2018
Messages
8,525
You could do this without code. Go to Excel and build a table of times. Import. Do a select distinct query for each day. Do a select distinct query for each station. Do a Cartesian query query with these (no joins) and make a calculated field adding the day and time. Now you have a query with every possible date and time for every location. Now left join this query to your table by station and the datetime field. You can work with this query or use it in a make table query.
 
Last edited:

Berlinomatt

New member
Local time
Today, 17:29
Joined
Aug 10, 2020
Messages
14
You could do this without code. Go to Excel and build a table of times. Import. Do a select distinct query for each day. Do a select distinct query for each station. Do a Cartesian query query with these (no joins) and make a calculated field adding the day and time. Now you have a query with every possible date and time for every location. Now left join this query to your table by station and the datetime field. You can work with this query or use it in a make table query.
Many thanks for the replies and apologies for crossthreading. My supervisor who will be analysing the data insists this is the format he requires for his analysis and didn't go much further into detail thus my request on how to do this still stands. I have subsequently tried using a MATCH funtion in excel after creating a complete timeseries next to the gappy one. However what I have discovered is that the weather stations did not record the time 100% precisely meaning excel cannot match the timestamps as there are miniscule differences when both converted back to numbers. I tried all sorts of roundup and rounddown functions in order to make them uniform but it still left errors. Would the method you described above still work or would this small difference also cause it not to match? If it would work, would you be so kind to break it down into simpler steps as I didn't understand many of the terms you used.
Hope this illustrates what I mean:
1597327012411.png
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 10:29
Joined
May 21, 2018
Messages
8,525
you can do a format the fields first and then link them. A format returns a string so it would change 43165..... to a string and both strings would match.
where Format([TimeStamp], "MM/DD/YYYY hh:mm") = format([some other field],"MM/DD/YYYY hh:mm")
 

CJ_London

Super Moderator
Staff member
Local time
Today, 15:29
Joined
Feb 19, 2013
Messages
16,610
but that could still mean an end time of 14:04 and the next start time as 14:05

it can be done in a query - on the crossposted thread there is a link to a solution using a non standard join. Assuming you are looking for the next record you would find the minimum time that is greater than the 'current' time and decide whether that is within a predefined limit
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 10:29
Joined
May 21, 2018
Messages
8,525
How is your supervisor going to analyze? Are you exporting to excel or doing it in the db. Either way you would not have to write to the table, because that could make a big table even bigger. You can work with the query or export the query. Adding a dummy record is simple, but may not be very efficient.
but that could still mean an end time of 14:04 and the next start time as 14:05
From what I am hearing, I do not think that can happen. Your values from the station are reported on the 5's but the seconds are off or the conversion is adding some seconds to it. That would make a standard join tough but doable

Can you post a sample of the data in Excel? Maybe a few thousands records and zip it up.
 

Berlinomatt

New member
Local time
Today, 17:29
Joined
Aug 10, 2020
Messages
14
The analysis will probably be done in R and we are just using Access as a repository for all the data as its too much for Excel to handle. The data will eventually be converted into .csv format so it can analysed on different platforms and climate models. I tried rounding up but it did indeed lead to some of the timestamps no longer having 5 minute intervals. I did manage to find a work around, albeit a very unelegant one. I converted both the complete timestamp series and the gappy timestamp series from the weather station into text format and then used power query to merge the tables as described here which gives me the empty rows I want: excel-university.com/vlookup-return-multiple-matching-rows-and-columns/
It works but its rather time consuming converting to text and then deleting and reinserting everything.
 

Berlinomatt

New member
Local time
Today, 17:29
Joined
Aug 10, 2020
Messages
14
Sorry I've tried ten times to upload either a zip folder, a database or an excel sheet and it just times out. Will try again tomorrow
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 09:29
Joined
Feb 28, 2001
Messages
27,150
You are concerned that you cannot get an affirmative matchup because of the headaches you have with those long decimal fractions. The problem with that comparison you showed us is simply that you are falling prey to the major problem in working with binary fractions. In your first "FALSE" comparison in post #5, 43165.6006944444 and 43165.6006944445 are both representations of the same approximate time, though why they differed in that last digit is anyone's guess.

Code:
debug.Print cdate(43165.6006944444)
3/6/2018 2:25:00 PM 
debug.Print cdate(43165.6006944445)
3/6/2018 2:25:00 PM

To compare numbers like this, you might need to ROUND them (see ROUND function) rather than converting to a date & time.


I'll try to not lose you here. Your problem is that dates and times are represented as a number of days since the Access "Epoch" date of 30-Dec-1899, which is day 0. Times are fractions of a day since midnight. Days are integers since the epoch. Simple enough, right?

That means that those two dates were 43,165 days since the Epoch date and just over 60% of the way through the day. But here is where life gets tricky. That fraction involves converting times to fractions of a day. But we use sexagesimal times, meaning base 60, for minutes. Factors of the 24-hour day are 2 and 3. Factors of the minutes are 2, 3, and 5. The mathematical problem that jumps up here is that a BINARY fraction only comes out even for factors of 2. Factors of 3 and 5 do NOT come out even. And that means that the repeating fraction you see (the string of 4s) is just a sign that something doesn't come out even. It is a dead giveway that you are looking at a repeating fraction caused by a mismatch between the intended number and the presented number.

Your solution is to recognize that you have too many digits there. Cut off a few using the ROUND function. BUT... how many digits should you have? There are 87,400 seconds in a day, or less than 100,000. That means that in DECIMAL you only need about 5 digits to specify a given second. I would try to round off those numbers (in a query, of course) to no more than 5 or 6 digits. You are showing 10 digits of fraction, five of which have no practical meaning.
 

Berlinomatt

New member
Local time
Today, 17:29
Joined
Aug 10, 2020
Messages
14
Sadly I already tried this and if I round only to 6 decimal places I don't get complete matches between the two time series. If I round to 5 decimal places then I get complete matches, however it moves the timestamp out of a five minute sync :-( This is rounded to five:
1597391597306.png

If I only round to 6 places then you can see I get false matches when I need true matches:
1597391752265.png

What are the requirements to upload a document as it cancels everything I try to upload. Is there a size restriction? I couldn't find the guidelines, thanks
 

Gasman

Enthusiastic Amateur
Local time
Today, 15:29
Joined
Sep 21, 2011
Messages
14,238
You
What are the requirements to upload a document as it cancels everything I try to upload. Is there a size restriction? I couldn't find the guidelines, thanks
You need 10 posts to upload/link. This is to prevent spamming. Unfortunately it affects genuine members as well, but a small price to pay I believe.?
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 22:29
Joined
May 7, 2009
Messages
19,230
you need to use the Date+Time column for your comparison.
the computer uses binary (by 2) so there is no exact decimal
for its fractional part.

some example of what you are doing.
see qryFinal.
 

Attachments

  • timeGap.zip
    45 KB · Views: 521

Users who are viewing this thread

Top Bottom