Question about best practice on overwriting records vs deleting records

Benginner2212 · Nov 29, 2023

I have a database tracking various audio, video and network connections. Each cable is identified with an alphanumeric code that consists of a 3 characters and a four digit number, the source and destination of the cable and the equipment that is connected by the cable.

Currently, our practice is that we delete the record for a given cable number from the database. Which so far hasn't been a problem, but I recently finished a project where my cable numbers were in the 9000 range. Due to limitations caused by other ways that we document how equipment is wired connected, the numeric portion of the cable number can't exceed 9999. To avoid going having cable numbers that exceed 9999, I was thinking that it would be best to overwrite records with new data instead of completely deleting the old records from the database. Then given the user an option to view a report the lists all of the decommissioned cable numbers that are present in the database so that they could effetely re-use a cable number.

I know it's a basic question but I am just curious if overwriting records could eventually cause the record or database to become corrupt?

June7 · Nov 29, 2023

If you don't care about history, then see no reason not to reuse. Instead of deleting, Null fields except the identifier. Next time user needs to enter a new record, present 'blank' record for completion if one exists. I have done this.

Benginner2212 · Nov 29, 2023

I have error checking on the all of the fields of the record to try an ensure that good data has been entered by the user before a record is saved. I am concerned that if I were to set the other fields to Null that I would be getting Null errors when I tried to pull up the record for a decommissioned cable number. Could I just overwrite the fields with other data like "N/C" or "DECOM"?

June7 · Nov 29, 2023

Sure. So then pull up a record that has "N/C" in a particular field, if any. If none exist, present user with a completely new record row.

The_Doc_Man · Nov 29, 2023

Two points to consider. First, your question about database problems if you "reuse" a slot? It depends, but usually not an issue UNLESS you somehow were using that letter/digit combination as a single-field primary key. (Prime keys should not be modified, as a theoretical thing.)

Second, from a mechanical standpoint, there is no practical difference between doing an UPDATE vs. doing a DELETE/APPEND combination. Both will have the same effect internally to the database.

The_Doc_Man · Nov 30, 2023

True enough, Pat. Like I said, an UPDATE cycle or a DELETE/APPEND cycle have about the same effect. However, the OP's database doesn't seem very large and the "churn" rate doesn't sound that high, so this might not be AS bad as some we've seen.

The more critical question is, regarding the old data, do they NEED to remember what WAS, or do they only care about what IS? From the early posts, it doesn't seem so.

@Benginner2212 - the question is, will you need to know anything about prior uses of your special ID codes? Is history of any value in what you do. Because given the question you asked, re-using a slot has one negative effect of importance... it complicates your ability to look at history. But this is not a mechanical issue - it is a theoretical/practical issue.

If you have a LOT of updating then you will now and then run into something called "bloat" - and it can be a problem if you let it run away with your DB. When you update records (because you are reclaiming them), Access effectively does the equivalent of a simultaneous DELETE (of the old record) and an INSERT (of the new record.) It has to do that in that specific way because of something called a "rollback" - a way to undo a failed transaction. So what happens is that the space occupied by the deleted record becomes temporarily lost until you do a Compact & Repair (C&R) action. For small records, this bloat takes a long time to build up to an important amount. However, the faster you update and the more you update, the sooner you will find that your DB has doubled in size and really needs the C&R.

Benginner2212 · Nov 30, 2023

@Pat Hartman
@ The_Doc_Man

I'm a broadcast engineer for my state's local PBS station and the purpose of of the database is what cables are connection various pieces of equipment together and the function of each cable that is connected to a piece of equipment.

All of the cables are given a 7 digit alphanumerical code that represents a category that is assigned to the cable based on the function of the cable and 4 digit number to give the cable a unique identifier in that category. So if a cable is going to be used for sending HD video from point A to point B that cable falls into our category of HD video cables and gets the 3 character prefix of HDV. Typically, when we run a new cable we just add 1 to the four digit number of the last cable that was ran in a given category. So because the last HD cable that was ran has the cable number of HDV2204 the new cable that is ran will be entered into the database as HDV2205.

The record in the database will indicate that HDV2205 connects the video output of Satellite Receiver A that is in Rack 24 is connected to Port 221 of our video router.

The reason that we can't exceed a four digit number in any given cable category is due to the fact that also use Auto Cad to create drawings of our various systems. So we have a drawing that shows all the equipment that is connected to the inputs of our video router. My boss is very particular about how those auto cad drawings are done and because all of our current drawings use the 7 digit alphanumeric code he doesn't want to get away from the 7 digit alphanumeric code so that the drawings remain uniform. I know that is a silly reason to be unwilling to change but it is a hill that my boss will die on so I've given up on trying to get him to change.

We consider any cable that is still in the racks as a active cable, so if we take out a piece of equipment but leave the cable in the rack we'll note in the database that one end of a cable is not connected. And then we consider any cable that is completely removed from the racks as a decommissioned cable and delete the cable from the database.

Going back to my example of the Satellite Receiver. PBS decides that is going to stop sending us content via Satellite receiver A. Since, we no longer need Satellite Receiver A, we'll remove the the Satellite Receiver from rack 24 and remove the cable from both the video output of the Satellite Receiver and port 221 of the video router. Then we'll pull the cable HDV2205 out of the racks completely, coil up the cable, cut off the labels, throw it in out bin of cables to be used at a later date and delete HDV2205 from the database.

We do keep a history in a roundabout way. When the main navigation page of the database is closed. a function is called that creates a backup of the database for that day and saves the backup file to a shared network folder. My IT person has that shared network folder setup to automatically delete any file that is older than six months. So we effectively have a six month history of changes to the database. Which is more than plenty for us because we aren't constantly swapping out equipment and adding or removing cables. But beyond that, I don't know if keeping a history would benefit us since we removed cables when we remove equipment. If we swap out equipment or move equipment, we just update the current record in the database with the changes.

Pat, your suggestion that I either delete or archive old data and then create a procedure to find gaps in the cable numbers is ironic because I was doing something similar before and was cautioned against that by an other access user because if something didn't get deleted properly I could potentially create a duplicate record and corrupts the database.. But I am thinking I might not have been doing it in the way your suggesting. I am not using the cable numbers as a primary key, but rather letting Access assign the primary key to the records. So the way that I was doing that was first I'd delete the old data from the database, a query would run that identified gaps in the cable numbers and then a new record would be created when a cable number was reused.

To me, it sounds like you are suggesting that I write the new data to the old record, which I would think wouldn't be possible if the record is deleted.

I am confused about the_doc_man's comments about bloat and rollback. Does that simultaneous delete and insert only happen when I change information on a record in the database or does it also happen when I delete one record and then create a new record?

And if I set up something to run a compact and restore of the database, would I run that every time the database is closed or would it be a periodic thing?

ebs17 · Nov 30, 2023

"HDV" and "9999"
Can't you open a new number range with a new alphanumeric code ("H2V"?)?

Benginner2212 · Dec 5, 2023

ebs17 said:
"HDV" and "9999"
Can't you open a new number range with a new alphanumeric code ("H2V"?)?

While that would be the easiest solution, my boss is against anything would affect the uniformity of the drawings and layout of the physical layout of the printed copy of the database and would veto the change. I am lucky that I was able to convince him that switching to an outlook database was a good idea.

Benginner2212 · Dec 5, 2023

Pat Hartman said:
No, I would delete the old record. the numeric suffix of 2205 is added at the time you put the pieces together and end up with HDV2205. There should be a unique index on the two fields "HDV" and "2205". To do this, you need to use the indexes dialog. You cannot create a two column index using the Table design view. In the indexes dialog, find the first totally blank row and add a name for the index. add the first column and mark the index as unique. then move to the next row. leave the index name blank and add the second column. You can create indexes of up to 10 fields using this method.

So you just increment the numeric sequence number as normal. Then check to see if it is > 9999. If it is, then you have to run your gap query to find the first gap. Use that number instead of the generated number as the new sequence.

Normally databases grow gradually. If you are frequently deleting records or updating them and potentially making them longer, then you run into the bloat issue because Access does not reuse the space for the deleted record or for the original smaller record. That means that instead of C&R monthly, you might need to do it weekly. Just watch how your db grows to see what the frequency you need is.

What I don't like about the reuse is that if you keep printed records of ANYTHING, they become suspect because you might have reused the number. I don't like deleting history so I have another thought. Keep the concatenated sequence number in a separate field with a non-unique. Then when you archive the record rather than delete it, remove the contents of the two working fields. You should add an ArchivedDT field so that the missing values make sense. This lets you keep all the old data and it doesn't interfere with the new data.

Using the second method, if you look for HDV2205 using the concatenated field, you might get two records whenever the field had been reused. So, you need to carefully consider which situations call for using the two working fields rather than the concatenated field for searching.

I think that I will have to go with the second solution that you've suggested. Right now the field for the "HDV" is a combo box from a table that contains all of the categories of cables that is separate from the table that contains the field for "2205" I don't think that I could create a unique index for the field that contains "2205" since it would contain duplicate entries.

I think I could add a field that contained the concatenated HDV2205 from the two working fields and either archive the records or search/print the records based of the concatenated field.

What happens when I archive a record? Does all the data for a record get archived or just what I tell access to archive? For example if the record for HDV2205 has a primary key of 321, and all of the other fields are populated with the source rack, destination rack and descriptions get archived or would it just be "HDV" and "2205" that get archived? If I reuse the number HDV2205 would the primary key for that record be 321 or would it have a new primary key?

June7 · Dec 5, 2023

Depends what you mean by "archive". I think in this case suggestion is to just set the value of a field ArchiveDT with date record is "archived". Then the record can be excluded from queries that filter out the "archived" records. This record is not deleted and PK 321 is still in table. If you reuse HDV2205, this would be creating a new record with its own new PK. This retains history, which you did not seem to care about.

An alternative was described in post 2 which everyone else here seems to reject.

June7 · Dec 5, 2023

Reusing a record worked for my db. If user aborted data entry (rare event) after record identifier was created (not autonumber PK), I saved that record for the next data entry, usually the same day so chronology remained valid even when there were multiple users. But chronology was not critical. Also, I allowed for VOIDING records and not reusing the identifier in cases of lost document or request was later withdrawn. I could not have gaps in this identifier assignment.

Another situation in same db was very old project numbers were getting reassigned (controlled by another office) but they would be a completely different project name. So with the combination of project number and name, reports could be distinguished.

Benginner2212 · Dec 5, 2023

Pat Hartman said:
Read my suggestion again. That is NOT what I told you to do. I told you to create a multi-field unique index and I even gave you directions on how to do it since it is not obvious from the interface.

I am sorry that I misunderstood your suggestion. I tried to create a unique index based on the two working fields that I am using for the cable category and the cable number but I get an error message that states I can't do it because it would create duplicate records. However I was able to create an non-unique index with the two working fields. I am going to try and follow the second suggestion that you made and create a field that concatenates the two working fields and give that field a non unique index.

The_Doc_Man · Dec 5, 2023

Benginner2212 said:
I am confused about the_doc_man's comments about bloat and rollback. Does that simultaneous delete and insert only happen when I change information on a record in the database or does it also happen when I delete one record and then create a new record?

Both an UPDATE and a combo of DELETE/INSERT have essentially the same effect in terms of bloat. DELETE/INSERT is just slower.

When you do an UPDATE, there is a rule relating to what is called a "rollback" (database slang of "undo"). For a brief moment, both the original record and the new record co-exist in different places in the physical layout of the file on the disk. If the updater task decides to abort the update action for some reason, the "old" record has to return to the table unchanged and the "new" record with all changes is discarded.

We have reason to believe that tables are actually a linked list of records, not an end-to-end array of records. So when the UPDATE actually executes for a given record, the old record is unlinked from the table's list and the new record is added to the table's list. But in a shared table, it is possible that someone else is currently running a query having that old record in place. So the old record can't be removed (yet). When that hypothetical SELECT query is done, the unlink can take effect. But because of that delay, the new (updated) record cannot take its place so it gets tacked onto the end of the table's linked list.

If you do this as an UPDATE, that unlink/link pair of operations is, from the program viewpoint, simultaneous - even though we know the computer MUST do things one instruction at a time. When you do the DELETE/INSERT operation as two discrete queries, the same steps are performed, but the insertion (2nd step) doesn't have to happen right away. However, the rule about not unthreading a record when someone else is running a query against that recordset still applies. So again, the INSERT tacks the new record at the end of the table's list.

Since we don't actually have Open Source code visibility for Access, I have to qualify the above as "one man's educated guess" as to what really happens behind the scenes. But over the years we have seen tidbits here and there to suggest that the above isn't terribly far from reality.

ebs17 · Dec 5, 2023

The_Doc_Man said:
Both an UPDATE and a combo of DELETE/INSERT have essentially the same effect in terms of bloat.

Are you talking about individual field content or records?

Is there verifiable evidence of bloat in updates?
I have only heard this once in a long period of practice, in a case where short strings were replaced by long strings on a large scale. This is theoretically understandable because the memory used for texts is allocated dynamically.

June7 · Dec 6, 2023

On reflection, I guess technically I am not "reusing" since the user never completed the data entry and no documentation was generated. However, the way it's programmed, the ID (not autonumber) is generated as soon as user initiates data entry and then if they abort I have an ID that needs to be accounted for. So in this case I chose to reserve the ID for the next data entry instead of VOIDING. VOIDING can occur later if paperwork is lost or request is withdrawn.

The_Doc_Man · Dec 6, 2023

Pat Hartman said:
You are assuming the delete/insert are adjacent.

Actually, no I'm not. "Slower" could easily be "days slower" as opposed to "seconds slower." But the net result is the same. And LOGICALLY you can replace a record, but PHYSICALLY with the Access method of garbage collection, you cannot. That record space is useless until the next Compact & Repair shuffles things down to the lowest possible addresses in the new copy of the database.

ebs17 said:
Is there verifiable evidence of bloat in updates?

It is a matter of degree but yes. In the databases I've used, updates AND insert/delete combos both lead to bloat. The questions are (a) how big are the records and (b) what is the "churn" rate (of record updating/replacement)?

ebs17 · Dec 6, 2023

That's too blurry for me. When I ask about UPDATE, I mean UPDATE. So it's about records that already exist and are only changed in individual field contents.
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?

The only variable memory usage is for text and memo. I avoid memo as much as possible because it doesn't store atomic information, but rather novels.
It is similar with the text data type: If I am oriented towards atomic information, then texts with 3 characters will not be replaced by texts with 150 characters to any noticeable extent.

So I repeat the question: Where is there significant bloat in the UPDATE?

By the way: I only experience append and delete sprees in temporary tables of a temporary backend, not in the tables of the backend and frontend of the actual work environment.
Compressing the backend is less about freeing up temporarily occupied storage space and more about maintaining indexes, updating table statistics, etc.

Gasman · Dec 6, 2023

ebs17 said:
That's too blurry for me. When I ask about UPDATE, I mean UPDATE. So it's about records that already exist and are only changed in individual field contents.
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?
The only variable memory usage is for text and memo. I avoid memo as much as possible because it doesn't store atomic information, but rather novels.
It is similar with the text data type: If I am oriented towards atomic information, then texts with 3 characters will not be replaced by texts with 150 characters to any noticeable extent.
So I repeat the question: Where is there significant bloat in the UPDATE?
By the way: I only experience append and delete sprees in temporary tables of a temporary backend, not in the tables of the backend and frontend of the actual work environment.
Compressing the backend is less about freeing up temporarily occupied storage space and more about maintaining indexes, updating table statistics, etc.

gemma-the-husky · Dec 6, 2023

I would hate to delete old data. I would archive old records to free up space. Maybe put them in another table so they are still there if you wanted to see the details. You may end up with multiple versions of some numbers, but that's a different issue.

Question about best practice on overwriting records vs deleting records

Member

AWF VIP

Member

AWF VIP

Immoderate Moderator

Immoderate Moderator

Member

Well-known member

Member

Member

AWF VIP

AWF VIP

Member

Immoderate Moderator

Well-known member

AWF VIP

Immoderate Moderator

Well-known member

Enthusiastic Amateur

Super Moderator

Similar threads

Users who are viewing this thread