Intercalate Sequence-help! :)

sneuberg · Jan 14, 2016

mfaqueiroz said:
Did you know if it is possible to do this "formula" with so many records?

I'm not sure what you are asking? Are you asking me if this can be speeded up?

I'll poke at it and see I can figure something out to speed it up, but the existence of the subquery is a problem. In effect a query is run for every record that is processed. At the moment I can't see how you can get rid of it; at least not if you do this with a query.

When I get time I'll try to do this in code. If you loop through these sorted records I think this might be able to be done in one pass.

The_Doc_Man · Jan 14, 2016

My operational comments become more significant in light of this wrinkle in which the values occur out of order.

If you have two fields in a sort key, one is primary and the other is secondary. When you get anomalous data because the sequence number is wrong for the time (or the time is wrong for the sequence number depending on which field is the primary sort), that might be a key for your tech people to find out how your bad records are coming about.

In the simple-minded case I used in an earlier post, if the ON OFF ON ON OFF sequence occurred because the time was wrong for the 4th event (the apparently extraneous ON), then that ON might not be extraneous after all - instead it might be misplaced.

You must determine the sources of these numbers and validate how they come into your central database. If, for example, you get these numbers from two different systems, check the time synchronization between them. If they are computers, see if they can both run NTP client (network time synchronization protocol) using one of your domain servers as the NTP server (time reference). That's just one thought to consider. I'm sure there are others. For example, is there a significant network latency with regard to your data source vs. the central database system?

sneuberg · Jan 14, 2016

After running some tests with about 3000 records I could see the problem. I first rewrote the query to get rid of the calculation. That looks like:

Code:

DELETE *
FROM OnOff
WHERE (SELECT TOP 1 Dupe.State
   FROM OnOff AS Dupe
   WHERE Dupe.Machine = OnOff.Machine
     AND (Dupe.Date < OnOff.Date
     OR Dupe.Date = OnOff.Date AND Dupe.ms < OnOff.ms)
   ORDER BY Dupe.Date DESC, Dupe.ms DESC,Dupe.ID) =[OnOff].[State];

and I post it in case anyone would like to suggest a way to improve it. It would need further improvement as it still took 2 minutes and 31 seconds to run with about 3000 records. This is better than the 4 minute 34 second time I was getting with the calculation, but I can see that the time is increasing exponentially with the number of records and that it would take long than 8 hours to process around 50,000 records. Obviously this is not a solution.

However, as I suspected, doing this in code results in a much faster process. The attached database includes code that can process about 58,000 records in just over 5 seconds.

In this database you will find:

frmRunPrograms: A form which has buttons which run two programs. The first just updates an added Yes/No field in the table to show which records are to be deleted. You can use this to determine if the other program which deletes the records works the way you want.

qryOnOff: The query is use as the record source in the programs. This query sorts the OnOff table first by Machine, then Date/Time, and then ms. The data needs to be sorted in this manner for the programs to function properly. The program is really simple minded. It just scans through this sorted list deleting the bad records the same way you would by hand with a pen.

The OnOff table has been left as the small (115 record version) with the bad records in place. Larger copies of the table are included for testing.

Intercalate Sequence-help! :)

sneuberg

AWF VIP

The_Doc_Man

Immoderate Moderator

sneuberg

AWF VIP

Attachments

Similar threads

Users who are viewing this thread