Denormalized database: update adjacent records (1 Viewer)

Noruen

Member
Local time
Today, 23:16
Joined
Jul 7, 2020
Messages
46
Hi gentlemens,

I suppose you will kill me for question regarding denormalized DB, but only you can help me.

I have table where I store Car log book:


idDatecar_idOdometer (absolute value)
112.10.2022110 000

etc

Now, as is probably evident, I store odometer values as absolute values. I decided to do so because it is more error-resistant and insertions are very simple (add absolute value and difference is calculated automatically in query, see below).

I generate report from this table (LogBook) where I use subquery to calculate difference between adjacent records (SELECT current.Odometer - Max(previous.Odometer) FROM … WHERE previous.Car_id = current.Car_id AND prevoius.Odometer < current.Odometer). This works fine as I have this query as saved query. However when I try to perform aggregations with this saved query it is very slow and errorneous.

so I decided that maybe one solution would be to add redundant field where I would simply calculate and save difference between adjacent records as it will increase performance significantly.

My question: do you have experience with this issue so that I can save a lot of time thinking about it?

Many thanks! :)
 

plog

Banishment Pending
Local time
Today, 16:16
Joined
May 11, 2011
Messages
11,646
Does car_id have an index on it in the table? That could speed things up

You can also try a correlated subquery to get the previous Odometer which might bet better performancewise. The 3rd SQL box on this page demonstrates that:

 

Noruen

Member
Local time
Today, 23:16
Joined
Jul 7, 2020
Messages
46
Does car_id have an index on it in the table? That could speed things up

You can also try a correlated subquery to get the previous Odometer which might bet better performancewise. The 3rd SQL box on this page demonstrates that:

Yes, I also imdexed Odometer and it helped a little.
Mentioned query is correlated.


Code:
SELECT current.Odometer - (SELECT Max(previous.Odometer)
                           FROM tblLog As previous
                           WHERE previous.Car_id = current.Car_id AND prevoius.Odometer < current.Odometer) As Difference
FROM tblLog As current
 

plog

Banishment Pending
Local time
Today, 16:16
Joined
May 11, 2011
Messages
11,646
If you are going to store the difference then you need to figure out how its going to get screwed up and mitigate those possibilities:

1. How does data get into your database/when will you be running this calculation. If its a mass import you simply run an UPDATE query on each batch of data. If people are using forms to enter data, you need to write a procedure to do it after each record is entered.

2. Will your data ever be changed? Can people go into the data and change a past Odometer value? If so that throws it's calculation and the next record's calcualtions off.

3. Could data be added in between existing records? If a record was missed and added later, that means the next record's calculation will be off.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 17:16
Joined
May 21, 2018
Messages
8,533
I suppose you will kill me for question regarding denormalized DB, but only you can help me.
So how many records are you talking 10k, 100k, 1M? This could make a difference in the strategy. If it is like 10k then cycling all the records and updating a static field can be done very frequently. If it is 1M then cycling and updating has to be done judiciously and as @plog points out you will have to be more methodical in how you update.
Denormalizing data for performance purposes can make good sense if there is no dynamic way to do it. For example I see a lot of people doing concatenation of child records for a query. That works OK for trivial size databases, but I will persist that field more often. But every time a child record is added or deleted I have to ensure code is run. Occasionally I will cycle all the records to ensure nothing was missed.

Out of curiosity you state.
Now, as is probably evident, I store odometer values as absolute values
What else would you store? That is the logical thing to do.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 17:16
Joined
Feb 19, 2002
Messages
43,301
1. The value of a field in one record should never depend on the value of a different record. PERIOD.
2. Do NOT calculate the difference between records in the query, use the report events.

Queries work on sets. They are not linear and that is why trying to do a linear process in a query is so slow. Reports are linear. As long as you apply a sort property to the report record 2 follows record 1, and 3 follows 2 - ALWAYS. That means in the on Format event, you can use the value saved the previous time the event was run in order to use it to calculate a value for the current row. There will be no delay caused by this calculation:)

I don't have code handy but you need two report wide variables so define them at the top of the form's code module. Access initializes numeric variables to zero. Name them PrevReading and PrevPrevReading.

In the format event
Code:
        Me.MilesTravelled = Me.OdometerReading - PrevValue
        PrevPrevValue = PrevValue
        PrevValue = Me.OdometerReading
In the Retreat Event
Code:
    PrevValue = PrevPrevValue

To explain - Access formats a line before it knows whether or not it can be printed on the page. If there is not enough room, it runs the Retreat event and then runs the code that causes a page break and prints the page footer for the current page and then the page header and any group header for the next page. Then it runs the Format event a second (or more) time.

So, since you altered the PrevValue already, you can't run the Format event code again because the saved values are not correct, that's why we keep two versions of previous. In the retreat event we move the PrevPrevValue back to the PrevValue so it is the same value that would have been there when the event ran the first time. There is no need to worry about leaving the PrevPrevValue and not changing it since we are only ever going back ONE record.
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 07:16
Joined
Jan 20, 2009
Messages
12,852
I used to do this kind of thing in Access with a self join and find the maximum of the Odometer reading that was less than the current one. It is quite a complex query.

These days I use SQL Server and ROW_NUMBER() so it is very simple.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 16:16
Joined
Feb 28, 2001
Messages
27,192
The question about the number of records is relevant. The more important question related to that is how often would you edit any of the records once you have stored them and computed your difference? If the answer to "how often" is anything other than "NEVER, once they are stored" then you will also face serious bloat issues and THAT is where the database gets so ugly that you CAN'T use the denormalized setup safely.
 

Noruen

Member
Local time
Today, 23:16
Joined
Jul 7, 2020
Messages
46
Ah, I will try to answer all of your questions :)

1) @MajP I have currently up to 20 k records but it increases every month by approx. 100.
2) @Pat Hartman I know it is bad so that is the reason I'm trying to find another solution, but...
2b) @Pat Hartman each record has also another columen where it is specified if "Jorueny" is Business / Private / Commuting. So I used basic query to calculate differences and then TRANSFORMed this query to Date / Business / Private / Commuting and that's where things got messy and it is very slow and unstable. I see your point and I have to give it some thoughts - if it would be feasible to do it in report (for me and my abilities).

3) @plog and @The_Doc_Man Edits minimally, but insertions quite frequently. So I suppose I've got my answer... Connect with Excel and do it there like in the old times :)

Anyway, thanks a lot guys!
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 17:16
Joined
Feb 19, 2002
Messages
43,301
I GAVE YOU THE MOST EFFICIENT SOLUTION with the least amount of code. Your request is not uncommon at all. It allows the data to be properly normalized and does not require a query that can potentially be very slow if there are a lot of records involved. If you have intermediate breaks such as for a journey, then you need to add a grouping to the report. It also complicates the calculation a bit since you need to reset the the PrevValue in the Journey Group's Format event. I'm assuming that the recordings for a "journey" will be together.

If the "journeys" overlap, that's a problem I don't know how you will solve without using a VBA loop to read the table and calculate the journeys separately and write out a new table with the summarized data. Then, you can make separate reports for each type of journey for a time period. Divide and conquer.

Reports process data SEQUENTIALLY and so there is only ONE pass through the recordset. The only requirement is that the Report have the correct SORT property.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 17:16
Joined
May 21, 2018
Messages
8,533
@Noruen,
The solution provided by @Pat Hartman, is not really a solution to a real problem. As you pointed out calculating the distance by itself is pretty trivial and can be done in a reasonable amount of time. There is no real need to do that in a report because doing that in a report is probably even slower than the query itself.

There are probably faster solutions but to get the previous reading i did it with a subquery and takes about a second with 100k records
Code:
SELECT tblodometerreadings.odometerreadingid,
       tblodometerreadings.vehicleid_fk,
       tblodometerreadings.triptype_fk,
       tblodometerreadings.odometerreading,
       (SELECT Max(B.odometerreading)
        FROM   tblodometerreadings AS B
        WHERE  B.vehicleid_fk = tblodometerreadings.vehicleid_fk
               AND B.odometerreading < tblodometerreadings.odometerreading
        GROUP  BY B.vehicleid_fk) AS PreviousOdometer
FROM   tblodometerreadings
ORDER  BY tblodometerreadings.odometerreadingid,
          tblodometerreadings.vehicleid_fk,
          tblodometerreadings.odometerreading;

But the issue is no one is ever going to ask for that information. So assume you have a real world problem. The boss wants the max, min, avg, and number of trips by vehicle by month. I tried any modification of the above query and Access completely chokes. So since the above query cannot be used in a reasonable time, I would build a temp table (semi temp in that you keep it populated), so that you do not have to mess with your data table. Only need two fields.
tblPreviousReadings
tblPreviousReadings
-odometerID_FK ' key to the data table
-PreviousOdometerReading ' place holder to store the previous reading.

You need to populate this table with all the keys from your data table and the previous odometer reading for that record.
1. First run an append query to append all keys not already in the tblPreviousReadings.
Code:
INSERT INTO tblpreviousreadings
            (odometerreadingid_fk)
SELECT tblodometerreadings.odometerreadingid
FROM   tblodometerreadings
       LEFT JOIN tblpreviousreadings
              ON tblodometerreadings.[odometerreadingid] =
                 tblpreviousreadings.[odometerreadingid_fk]
WHERE  (( ( tblpreviousreadings.odometerreadingid_fk ) IS NULL ));
2. Delete any keys that are no longer in the data table
Code:
DELETE tblpreviousreadings.*
FROM      tblpreviousreadings
LEFT JOIN tblodometerreadings
ON        tblpreviousreadings.[OdometerReadingID_FK]=tblodometerreadings.[OdometerReadingID]
WHERE     tblodometerreadings.[OdometerReadingID] IS NULL;

3. Now through code update tblPreviousReadings
Code:
Public Sub UpdatePreviousOdometer()
  Dim rs As DAO.Recordset
  Dim prev As Long
  Dim strSql As String
  Dim vehID As Integer
  Dim PrevOdom As Long
  DAO.DBEngine.SetOption dbMaxLocksPerFile, 15000
  strSql = "SELECT tblOdometerReadings.OdometerReadingID, tblOdometerReadings.VehicleID_FK, tblOdometerReadings.OdometerReading, tblPreviousReadings.PreviousOdometerReading " & _
           "FROM tblOdometerReadings INNER JOIN tblPreviousReadings ON tblOdometerReadings.OdometerReadingID = tblPreviousReadings.OdometerReadingID_FK Order By VehicleID_FK, OdometerReadingID"
  Set rs = CurrentDb.OpenRecordset(strSql)
  Do While Not rs.EOF
    If vehID <> rs!vehicleID_FK Then
      PrevOdom = 0
      vehID = rs!vehicleID_FK
    End If
      strSql = "Update tblPreviousReadings set PreviousOdometerReading = " & PrevOdom & " where OdometerReadingID_FK = " & rs!odometerReadingID
    CurrentDb.Execute strSql
    PrevOdom = rs!OdometerReading
    rs.MoveNext
  Loop
End Sub

On 100k records the above code takes 62 seconds. But after that is done you can do complex queries in no time. This below report runs instantaneously based on the 100k records.

MinMaxAvg.jpg

Doing it this way you do not have to delete and rea dd all the keys. This will keep the db from bloating. You can just add new and remove the deleted keys. Then before running your report you update the entire list which ensures all data is up to date if any additions, deletions, or edits to the data occurred. For demonstration purposes there are some queries using the subquery to show how slow it is.
 

Attachments

  • Odometer (2).zip
    3.7 MB · Views: 98

Noruen

Member
Local time
Today, 23:16
Joined
Jul 7, 2020
Messages
46
Thanks a lot @MajP ! I see this is probably only solution, really. I was thinking a lot about @Pat Hartman 's idea to do it in Report, but it is convenient for just monthly report.

I will download your example and analyze it. Anyway thanks again for your time! :)
 

Users who are viewing this thread

Top Bottom