Question about best practice on overwriting records vs deleting records

ebs17 · Dec 5, 2023

The_Doc_Man said:
Both an UPDATE and a combo of DELETE/INSERT have essentially the same effect in terms of bloat.

Are you talking about individual field content or records?

Is there verifiable evidence of bloat in updates?
I have only heard this once in a long period of practice, in a case where short strings were replaced by long strings on a large scale. This is theoretically understandable because the memory used for texts is allocated dynamically.

Pat Hartman · Dec 5, 2023

ebs17 said:
Is there verifiable evidence of bloat in updates?

Only when new records are larger than old records and there is not enough free space in the physical record to write it back where it was. But there is ALWAYS bloat with the delete/append approach.

For this particular application, we are not talking about excessive bloat as we have in situations where developers use make tables instead of queries as recordsources for reports and forms.

June7 · Dec 6, 2023

On reflection, I guess technically I am not "reusing" since the user never completed the data entry and no documentation was generated. However, the way it's programmed, the ID (not autonumber) is generated as soon as user initiates data entry and then if they abort I have an ID that needs to be accounted for. So in this case I chose to reserve the ID for the next data entry instead of VOIDING. VOIDING can occur later if paperwork is lost or request is withdrawn.

The_Doc_Man · Dec 6, 2023

Pat Hartman said:
You are assuming the delete/insert are adjacent.

Actually, no I'm not. "Slower" could easily be "days slower" as opposed to "seconds slower." But the net result is the same. And LOGICALLY you can replace a record, but PHYSICALLY with the Access method of garbage collection, you cannot. That record space is useless until the next Compact & Repair shuffles things down to the lowest possible addresses in the new copy of the database.

ebs17 said:
Is there verifiable evidence of bloat in updates?

It is a matter of degree but yes. In the databases I've used, updates AND insert/delete combos both lead to bloat. The questions are (a) how big are the records and (b) what is the "churn" rate (of record updating/replacement)?

ebs17 · Dec 6, 2023

That's too blurry for me. When I ask about UPDATE, I mean UPDATE. So it's about records that already exist and are only changed in individual field contents.
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?

The only variable memory usage is for text and memo. I avoid memo as much as possible because it doesn't store atomic information, but rather novels.
It is similar with the text data type: If I am oriented towards atomic information, then texts with 3 characters will not be replaced by texts with 150 characters to any noticeable extent.

So I repeat the question: Where is there significant bloat in the UPDATE?

By the way: I only experience append and delete sprees in temporary tables of a temporary backend, not in the tables of the backend and frontend of the actual work environment.
Compressing the backend is less about freeing up temporarily occupied storage space and more about maintaining indexes, updating table statistics, etc.

Gasman · Dec 6, 2023

ebs17 said:
That's too blurry for me. When I ask about UPDATE, I mean UPDATE. So it's about records that already exist and are only changed in individual field contents.
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?
The only variable memory usage is for text and memo. I avoid memo as much as possible because it doesn't store atomic information, but rather novels.
It is similar with the text data type: If I am oriented towards atomic information, then texts with 3 characters will not be replaced by texts with 150 characters to any noticeable extent.
So I repeat the question: Where is there significant bloat in the UPDATE?
By the way: I only experience append and delete sprees in temporary tables of a temporary backend, not in the tables of the backend and frontend of the actual work environment.
Compressing the backend is less about freeing up temporarily occupied storage space and more about maintaining indexes, updating table statistics, etc.

gemma-the-husky · Dec 6, 2023

I would hate to delete old data. I would archive old records to free up space. Maybe put them in another table so they are still there if you wanted to see the details. You may end up with multiple versions of some numbers, but that's a different issue.

Pat Hartman · Dec 6, 2023

@ebs17 I'm not sure who you are talking to.

Pat Hartman said:
Only when new records are larger than old records and there is not enough free space in the physical record to write it back where it was. But there is ALWAYS bloat with the delete/append approach.

This was my response. The records are overwritten if the replacement is equal to or shorter than the original value. If the replacement is longer, it will stay in the same physical record if there is room. Otherwise, the old record is replaced with a pointer and a new physical record is created. Doc disagrees with this description and says that even updates cause displacement which = bloat.

Isaac · Dec 6, 2023

Benginner2212 said:
I have a database tracking various audio, video and network connections. Each cable is identified with an alphanumeric code that consists of a 3 characters and a four digit number, the source and destination of the cable and the equipment that is connected by the cable.

Currently, our practice is that we delete the record for a given cable number from the database. Which so far hasn't been a problem, but I recently finished a project where my cable numbers were in the 9000 range. Due to limitations caused by other ways that we document how equipment is wired connected, the numeric portion of the cable number can't exceed 9999. To avoid going having cable numbers that exceed 9999, I was thinking that it would be best to overwrite records with new data instead of completely deleting the old records from the database. Then given the user an option to view a report the lists all of the decommissioned cable numbers that are present in the database so that they could effetely re-use a cable number.

I know it's a basic question but I am just curious if overwriting records could eventually cause the record or database to become corrupt?

I almost always err on the side of deactivating (etc) records, mark as deleted, don't actually delete.

Reason: it's really, really hard to troubleshoot things that don't exist.

The_Doc_Man · Dec 6, 2023

ebs17 said:
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?

It DOESN'T get longer. But you miss the point. When a transaction is underway and doesn't get committed (specifically including DB.Execute sql, dbFailOnError), the WHOLE TRANSACTION rolls back. But you and I both know that the instructions are executed one at a time. Since the records ALL get processed before you regain control after an UPDATE query, it LOOKS monolithic to us - but it is not. (It really can't be!)

Where this becomes an issue is that you make an updated copy of record A, but since there is no COMMIT just yet, the new record A* has to just sit there until the COMMIT arrives - and therefore, so does the OLD record, because it hasn't been obviated just yet either. Now go on to update record B and do the same thing. Within the scope of that COMMIT operation, you have updated records A*, B*, C*, ... and original records A, B, C. The old versions of the records are just waiting for the new records to become current. This means that the virtual memory boundary was at least temporarily "stretched" by the combined lengths of A*, B*, and C*. In the meantime, A, B, and C are still in their correct places.

So...two cases apply.

First, imagine that the transaction fails and has to be rolled back. OK, just delete A*, B*, C*, etc. from where they were waiting, which would be in the next free space after the currently occupied table space in the DB's virtual memory. It might be possible to re-shrink virtual memory boundaries, though I tend to doubt it. Windows DOES have a way to adjust virtual memory size of your process - an API call, ... but it is an expensive operation because I think it requires you to be swapped out to virtual memory in order to adjust some of the virtual memory registers that map your process to physical memory.

Second, let's say the transaction succeeds and the new records have to become current. So if you want Access to be efficient, are you going to COPY the set of A*, B*, C* over the places occupied by A, B, C..., potentially having to copy some number of bytes into the middle of a buffer area, taking into account that if you updated a string field to be longer, you CAN'T do an overlay? Or do you just change the linked list that identifies/locates the table's records so that the pointers now point to A*, B*, C* and then go back to mark A, B, and C as deleted? Which one will take longer? Which one moves less total data? Access works on pointers a LOT. An open query has a bunch of pointers to identify the records being selected, updated, or deleted.

Bloat occurs because those old records A, B, and C are now deleted data, which doesn't get reclaimed until the next C&R. Access CAN'T reclaim that space on-the-fly because it requires you to step through the deleted data records to find one of the right size. HINT: This scan causes virtual memory thrashing, which totally screws your memory performance to the floor. Or into the basement.

So it isn't size-change in a record that causes bloat. It is the fact that you have rollback rules including the dbFailOnError case.

Isaac · Dec 6, 2023

Long story short, I've never experienced bloat from updates. Period. Not even perceptible, anyway

The_Doc_Man · Dec 6, 2023

Isaac said:
Long story short, I've never experienced bloat from updates. Period. Not even perceptible, anyway

For very small numbers of updated records, given that Windows rounds databases to the nearest Megabyte, a large database with limited churn of records won't show a lot of bloat right away.

Isaac · Dec 6, 2023

I just created a 2007 Access database with Table1 containing one million records with one column of Text datatype all containing the value "dog".
Database was 26 MB.

I then updated all million records to "cat" ... db still 26 MB.

Pat Hartman · Dec 6, 2023

Try running the update again but change it from "Cat" to "Penguin". Do you see a size change?
Try running the update again but change "Penguin" to "Elephant" but only do it for rows autonumbers = 20-500. Open the table without sorting. Did rows 20-500 disappear? Are they at the end of the table?
Try deleting all the rows and then re appending them. Do you see a size change?

Isaac · Dec 6, 2023

well i know i will see a size change if i lengthen the text, also as well as delete and reappend, also i know the sequential nature of the identity keys will be lost - all that i knew. i was just pointing out that a simple update, all else being equal, is not causing a size change.

Pat Hartman · Dec 6, 2023

The questions were for me, not you

since you already had a database with a million rows. I just wanted to clear up some points once and for all, especially #2. Did the changed records move to the end of the file? You won't see this effect if you have ever sorted the table. Access seems to remember that setting and so sorts every time you open in DS view so the records would stay in PK sequence.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
Try reading the directions one more time. Let me repeat it for you.

June likes the reuse idea. I disagree. I think it is worse than deleting, but that is just an opinion, especially since I believe that both are wrong. The real issue is do you have paperwork that references the generated ids? You have not actually addressed this question.

I am still not following you on how to create the unique index based on two fields of the form. This is the process that I did based on your directions.

I opened the indexes dialog box and the PrimaryKey Index is already listed.

I create a new index on the first blank line and call it fullCableNumber and the field name that select is my field called cableCategory_PK. The data type of the cableCategory_PK is a number because I have a parent table that contains all of the cable categories that we use. I then add the second field to the index which is the cable number field. Those are the two fields that are used to generate the cable number HDV2022. When I try to save that index, I get an error message that access is unable to save because it would result in duplicate values in the indexes, primary keys or relationships.

And yes we have paperwork that references the generated ID's

gemma-the-husky · Dec 12, 2023

What access is saying is that your data has at least one pair of rows where those key values are duplicated, so it can't set the new index as unique. If it is supposed to be unique, you need to find the duplicates and resolve the issue.

Pat Hartman · Dec 12, 2023

Benginner2212 said:
When I try to save that index, I get an error message that access is unable to save because it would result in duplicate values in the indexes, primary keys or relationships.

You need to clean up the existing data first. OR add a third field to the PK. NEVER include the autonumber PK in a multi-field index.

Since you have paperwork, I would NOT delete records and I would NOT overwrite records. You have to try to implement my suggestion where you keep the two underlying fields as the compound unique index and the concatenated field as non-unique. When you need to archive a record, you set the archive date and set the two underlying data fields to null

Benginner2212 · Dec 12, 2023

Pat Hartman said:
You need to clean up the existing data first. OR add a third field to the PK. NEVER include the autonumber PK in a multi-field index.

Since you have paperwork, I would NOT delete records and I would NOT overwrite records. You have to try to implement my suggestion where you keep the two underlying fields as the compound unique index and the concatenated field as non-unique. When you need to archive a record, you set the archive date and set the two underlying data fields to null

When you say to add a third field to the PK, do mean add a third field to the Index that I am trying to create or add a third field to the parent table for the cableCategory_PK?

I can't think of a way to clean up the data to remove the duplicate entries error because there a multiple cables in each category so there will always be duplicates if I try to create an index based of the cableCategory_PK.

Question about best practice on overwriting records vs deleting records

Well-known member

Super Moderator

AWF VIP

Immoderate Moderator

Well-known member

Enthusiastic Amateur

Super Moderator

Super Moderator

Lifelong Learner

Immoderate Moderator

Lifelong Learner

Immoderate Moderator

Lifelong Learner

Super Moderator

Lifelong Learner

Super Moderator

Member

Super Moderator

Super Moderator

Member

Similar threads

Users who are viewing this thread