Database Bloat

Foster2019 · Oct 9, 2019

Hi people,

So I read an article posted by theDBguy on 11-03-2018 about database bloat, and I think the conclusion was that append queries and make table queries both cause similar size bloat, and that make table queries where faster.

My questions are,

If I have a table with several columns that would only ever have 20 rows that would keep the same primary key. But all the data in the table except for the primary key changed every week (some cells with empty values) what would cause less bloat.

A) Delete the whole table each time and create a new one using a template and the copy object method, keeping the primary key and having empty values in all other cells. Before using an update query to update the records

B) Delete the whole table each time and create a new one using a template and the copy object method, with no rows or data and using the open recordset add new method

C) Using an update query to null all the values except for the primary key before inserting the new data via the update method keeping the primary key

D) Using a delete query to delete all the rows but not the table then using the open recordset add new method creating a new primary key.

I’ve read that deleting tables every time could cause other problems like instability as mentioned in the article I was reading, but I don’t think I fully understand what that means?

Is deleting rows of data the same as deleting all the values and then updating the empty values when it comes to database bloat? And is deleting a primary key going to cause more bloat than any other column?
The table hasn’t got any relationships.

I hope the above makes sense :/
It’s the end of a long day

Thanks people

theDBguy · Oct 9, 2019

Hi. If I understood your question correctly, then I would say using an UPDATE query would probably cause less bloat, if any at all.

June7 · Oct 9, 2019

What is purpose of a table where every value would change except primary key?

I use 'temp' tables - tables are permanent but records are temporary - and have not experienced bloat or instability issues.

If db is split, these 'temp' tables must be located in frontend. If it's not split then would certainly be unstable for multiple users.

Deleting and creating objects is a design change. If db is deployed as an executable, that approach might not be possible - I've never had to work with accde file.

isladogs · Oct 9, 2019

The thread being referenced is probably https://www.access-programmers.co.uk/forums/showthread.php?t=302259

I agree you should use an update query for this. Just one ….no need to null the existing values first...just replace using the update. There is absolutely no benefit to using any other method in your case

In that thread, I was the one who mentioned instability through repeated use of make table queries. I've experienced 'unrecognised database format' errors in several cases which, for me, were always fixed by decompiling. You might not be so lucky.

@June
You can create, edit and delete tables, queries and macros in ACCDE files (unless fully locked down).
You may be thinking of changes to forms, reports and code, all of which are disabled

The_Doc_Man · Oct 9, 2019

Actually, the mechanism is less important than the final effect. Because of sharing issues, Access never actually re-uses deleted space until you Compact & Repair the DB. Instead, it marks the record as "deleted" (displayed as #deleted#) so that the user who comes behind you no longer tries to touch that record. Remember that user A can delete (technically, "mark for delete") a record that was opened at the time by user B. If the locking is set up correctly, this will happen almost transparently.

The issue of doing an UPDATE is that the physical record doesn't get updated. The physical record gets REPLACED by its updated equivalent. If you did a DELETE first and followed it with an INSERT INTO, Access still doesn't garbage collect because of sharing issues.

Stated another way, if you do the UPDATE, it isn't a true "update in place" because of the potential that the currently targeted record might be open in someone else's open report, so the update marks the old record and writes a new record corresponding to the one just updated.

Net result in ANY order that you do it? The old record is marked awaiting the next C&R and the new record is threaded into the table's list of known records. This activity has two effects: (1) Bloat occurs due to the growth of records marked for deletion and (2) tables opened directly or through a query lacking an ORDER BY or GROUP BY clause will appear to be written in an order other than sorted by prime key.

NOTA BENE: Opening a table in datasheet view on your screen actually opens an implied "generic" query that would provide records in PK order. When I say "open a table" I mean through any method involving a recordset, express or implied, as long as the opening method doesn't impose order on its target.

Foster2019 · Oct 9, 2019

Thanks people,

Yes Isladogs, that’s the article I mentioned, I should have posted a link. It had been a long day and I wasn’t really thinking.



I agree it’s a weird setup,

The primary key represents people and the columns are time slots, the code checks the column to see if the person is available before assigning them a task. Other code picks up later to put into a printable excel rota.

Thanks for the information about how access handles data when it is deleted; I feel that I have a better understanding now. I think I’ll stick to using an update query.

I’ll make sure I keep the temp tables in the front end, knowing that is really useful.

The_Doc_Man · Oct 10, 2019

Pat, I respectfully disagree. Not trying to step on your toes here, but I have looked into this in the past because I had to have answers for some government wonks who knew what they wanted but didn't know doodlum squat about databases.

Here is the short answer: Access CAN'T do an "update in place" in a shared environment because users might not have set dbSeeChanges in their recordset options. If I recall correctly, that isn't the default. Access has to retain the values at the very least for the currently open record. The simplest way out is to NEVER assume that the user has selected the dbSeeChanges option, so Access will never do UPDATE-in-place.

In a split shared DB, your FE makes a temp copy of what it is going to look at, but if you open your recordset and then someone else behind you opens an overlapping set, neither of you will see changes immediately. The delay is based on the FE's auto-refresh interval, one of the options. Until that auto-refresh interval trips, you will be able to see things in your set that aren't there any more in your co-worker's set. Both sets of records are valid, each for their time of snapshot. Access won't know or indicate that a change has occurred until it goes back to do the update or until that auto-refresh detects a change. And an update in place would make that nearly impossible to recognize.

Following is a more in-depth discussion of why this is so. Folks not interested in this level of nuts and bolts might not care to read further.

Consider the order of operations of SQL: https://www.periscopedata.com/blog/sql-query-order-of-operations

1. Get source tables & queries identified (the FROM clauses)
2. Identify the records to be used (the WHERE clauses)
3. Form aggregates (the GROUP BY clauses)
4. Identify aggregates to be used (the HAVING clauses)
5. Identify final data (the SELECT clause for SELECTs; the field lists for UPDATE, INSERT queries)
6. Sort returned records (the ORDER BY clause)
7. Apply limits (such as TOP N clauses)

If you look at this, once SQL has formed the list of working records, what happens after that moment is on a parallel course. The records selected by the FROM/WHERE combination are now on their own, at least briefly, when it comes to grouping, forming aggregates, filtering aggregates, etc.

As to whether records are changed in order of appearance frequently? I have observed this. Remember, opening a table in Datasheet View implies a hidden query that happens to apply the PK in an ORDER BY statement. You have to check for record scrambling in something that doesn't come through that implied ORDER BY. Further, if the table is large enough and updates are sparse enough, you won't see many changes anyway.

FWIW, I had to use some single-table queries for my biggest project, and diddling around, I took out the ORDER BY clause from that query. Since that wasn't the table's "implied query" datasheet view, ordering wasn't automatic and I could see the real order. The tables get pretty jumbled pretty quickly. In fact, it seemed based on a couple of experiments I ran that the table slowly gravitated to chronological order of update, most recent changes last. A C&R didn't scramble them back into PK order, either.

OK, it's not an earth-shaking thing, and if your queries have an ORDER BY clause you never see it anyway.

One last reason why Access cannot do UPDATE-in-place: It would among other things violate the ANSI standard (1992) for cursors for "INSENSITIVE" cursors (i.e. you didn't specify dbSeeChanges).

This quote comes from (Second Informal Review Draft) ISO/IEC 9075:1992, Database Language SQL- July 30, 1992 and I have underscored one key phrase.

If a cursor is open, and the current SQL-transaction makes a change
to SQL-data other than through that cursor, and the <declare cur-
sor> for that cursor specified INSENSITIVE, then the effect of
that change will not be visible through that cursor before it is
closed. Otherwise, whether the effect of such a change will be
visible through that cursor before it is closed is implementation-
dependent.

In essence, that can't happen if you do UPDATE-in-place. "Cursor" in this context of course is the same as a Recordset object.

More that you wanted to know? You WERE warned earlier.

The_Doc_Man · Oct 10, 2019

Both ACE and SQL claim to be more or less compliant with ANSI 92 SQL standards. My comments were about any SQL that claims such compliance.

As to whether C&R re-orders the tables to PK order, I have to admit I did not think it did that, though that is a minor issue to be honest. And as you point out, the longer it has been since you did a C&R, the more likely it is that record order has been scrambled by updating, thus rendering the question moot.

jcoogan · Mar 9, 2023

Here's another possibility that took me forever to discover. I am using Microsoft 365 subscription, ver 2302. Check menu: "File, Options, Current Database". Look for "Track name Autocorrect info" and uncheck it.

I >think< the idea behind this option is that Access will auto correct other instances of objects behind the scenes when you change an object name. Example: You have two queries that are linked and you change the name of one, then Access will automagically update the name change in the separate join query, preventing it from breaking. Pretty cool, but also possibly pretty expensive in terms of database bloat because Access has to write all these object references somewhere!

In my case I have a database with code only, data tables are in a separate linked database. The code database was bloating from 3000k to the 2gig limit in 20 min! My code database is doing a lot of database object and recordset object creation and deletion. When I unchecked the option named "Track name Autocorrect info" it resolved the issue and now the code database only grows by about 500k.

theDBguy · Mar 9, 2023

jcoogan said:
Here's another possibility that took me forever to discover. I am using Microsoft 365 subscription, ver 2302. Check menu: "File, Options, Current Database". Look for "Track name Autocorrect info" and uncheck it.

I >think< the idea behind this option is that Access will auto correct other instances of objects behind the scenes when you change an object name. Example: You have two queries that are linked and you change the name of one, then Access will automagically update the name change in the separate join query, preventing it from breaking. Pretty cool, but also possibly pretty expensive in terms of database bloat because Access has to write all these object references somewhere!

In my case I have a database with code only, data tables are in a separate linked database. The code database was bloating from 3000k to the 2gig limit in 20 min! My code database is doing a lot of database object and recordset object creation and deletion. When I unchecked the option named "Track name Autocorrect info" it resolved the issue and now the code database only grows by about 500k.

Hi. Welcome to AWF!

Thanks for sharing. Yours might be a special case.

isladogs · Mar 9, 2023

@jcoogan
I see you've also posted exactly the same observations in threads at several forums
I also think there may be something unusual about your setup.
I have used the Name AutoCorrect feature for years and never had the bloating issue you describe.

ebs17 · Mar 9, 2023

The Name AutoCorrect feature is a performance brake and should be switched off.

Who needs something like that when you have a plan for your objects and their names and won't change them afterwards?

isladogs · Mar 9, 2023

Even with the best planning, there are situations when renaming objects is appropriate.
There are many occasions when Name Autocorrect is very useful.
For example when taking over a badly designed database to improve its design.

As long as the feature is used with care, it works well.
When it was first introduced around 2000, the feature was problematic.
Although that was fixed in the next couple of version, many people turn it off when not actually currently needed.

As for a performance brake, that should rarely, if ever, be an issue as it won't be used when other processes such as complex queries are being run

ebs17 · Mar 9, 2023

Even with the best planning, there are situations when renaming objects is appropriate.

I would distinguish between being in the process of developing and maintaining an application, and being in the process of doing real work.

If I'm constantly renaming while working and have to implement that via the application, I'd better rethink my confused way of working.

isladogs · Mar 9, 2023

So would I if that was what I did. Who, apart from yourself, said anything about constantly renaming?

@jcoogan did refer to "a lot of database object and recordset object creation and deletion." which is in itself a bad idea and will definitely cause significant bloating

ebs17 · Mar 9, 2023

constantly renaming?

Why let a service run and consume resources if there is no need for it?

isladogs · Mar 9, 2023

As previously stated

... many people turn it off when not actually currently needed.

ebs17 · Mar 9, 2023

Nice description.

Wizards that tinker around should be fully understood and used only within their intended capabilities. What the developer of this assistant has thought and what a user thinks it should do, there can be huge differences.

isladogs · Mar 9, 2023

@Pat Hartman
Yes, its a great description but with respect I don't believe its completely correct for several reasons:

1. When Name AutoCorrect was first created in A2000(?) it often caused corruption and many developers, not least yourself, often called it Name AutoCorrupt for good reason. However, those issues were fixed in the next release

2. It is certainly true that closed objects do not appear in the Name AutoCorrect Log after a rename is done. They do appear when the object is next opened. However, I would say that is masking what is actually happening as can be tested very easily

Turn on all 3 Name AutoCorrect options
Select a table with many dependant queries. Open the Object Dependencies feature
Open one or more of the dependant objects and then rename the table.

Open the Name Autocorrect Log table and confirm that only the opened objects are listed
Open another dependant object. The log table is updated with an additional record.
So far, we are largely in agreement....

Now rename the table again and open one or more of the unopened queries.
Each is successfully updated to the latest iteration of the table name

Rinse & repeat multiple times. It works perfectly.
In my tests, the issues you describe no longer occur (I accept they did so back in A2000)

Rename the table(s) again. Close the database. Compact. Reopen & recheck the queries. All should still be working fine

To help with testing, I am attaching a test database used with one of my web articles on query joins
It has 3 test tables & lots of different queries. I've enabled all 3 NAC options

If any of the above summary is incorrect, do let me know

isladogs · Mar 9, 2023

Nor would I turn it off part way through as it would obviously cause problems.
Similarly, I wouldn't try to drive my car when the engine had been partly rebuilt but not finished

When NAC was first released, I had the same opinion as you and switched it off.
Several years later, I went back to it & have used it with ZERO issues for about 15 years.
As a result, although I didn't say so earlier, I leave it on all the time and no longer bother logging the changes.
Yes, I'm careful with its use but no longer feel I need to be that careful.
Of course, because I do plan my databases carefully (in theory anyway!), its only actually used occasionally within any particular app

However, I regularly reuse objects in new databases & that's when I will often rename tables/queries etc

In my experience over the past 15 years, its totally reliable and doesn't cause any performance issues

But please do try it again e.g. with my test DB but this time leave NAC turned on

Database Bloat

New member

I’m here to help

AWF VIP

Access MVP / VIP

Immoderate Moderator

New member

Immoderate Moderator

Immoderate Moderator

New member

I’m here to help

Access MVP / VIP

Well-known member

Access MVP / VIP

Well-known member

Access MVP / VIP

Well-known member

Access MVP / VIP

Well-known member

Access MVP / VIP

Attachments

Access MVP / VIP

Similar threads

Users who are viewing this thread