How do we historicize some rows in data tables? (3 Viewers)

amorosik

Active member
Local time
Today, 20:16
Joined
Apr 18, 2020
Messages
785
Suppose we have our beautiful program that operates on three tables: Documents, Customers, and Phone Calls
After several years, the Documents table will contain many records that will likely no longer be useful and will unnecessarily burden the archive
How do we historicize all documents dated before 2025 in the DOCUMENTS table and then physically move them to another database, while still being able to retrieve the historicized information?
 
An append query, followed by a delete query, both with appropriate parameters, would do the job. You can retain access to the archived data by linking it to the production database.

I am not sure, though, what you mean by "historicize" them. How is that different from moving them to the archive database?
 
Why do you need to move them to another database? Data doesn't take too much space on disk. You can set a default value for date in your search form, so when users search for something, it shows the data from 2025 and later as default. One bonus is that if for some rare reasons they need to refer to old data, they simply can change the default value and search.
 
My solution has been to save the historic data in the tables to XML, which is easily retrievable, when required.
 
As KitaYama has pointed out, there is no need to move the rows to another table. The value of an Historicized column of Boolean (Yes/No) data type could be set to True. In its normal day to day operations the database would then use those rows WHERE NOT Historicized.

The attached little demo file illustrates a similar approach to mark records as deleted. One or more records marked as deleted can be restored at any time should this be necessary. You'll see that the demo includes an option to delete multiple rows where the value in a column of date/time data type is prior to a date entered into an unbound text box in a dialogue form. This is analogous to marking rows as historicized in your case.

The demo does include an option to move a single deleted row to a backup table, as you are proposing. You could easily amend this to move multiple rows. One or more rows can be restored from the backup table at any time if necessary. Note that to avoid primary key conflicts if the database is compacted and repaired, and a backed up record is then restored, the ContactID primary key of the Contacts table in the demo is not an autonumber, but is computed when a row is inserted.
 

Attachments

My colleagues have offered a few thoughts on the subject. I'll take a different direction just to cover some bases. Let's assume for the sake of discussion that you really DO need to remove - but somehow preserve - old data.

The problem, of course, is the intertwining relationships that can exist sometimes between two or more tables. You therefore have to create a "deconvolution" plan that starts by identifying tables that depend on other tables (and why).

Simple case: An invoice table with a child table of invoice line-item details, one/many parent/child relationship, with relational integrity enabled. You can't archive the invoices first, because removing the invoice entries would violate relational integrity for the line-item table. So you would think that have to archive the CHILD table (invoice details) first. EXCEPT that you then need to consider what format you are using to store the details. Your plan needs to take the master DB and the archive destination into account.

If you are planning to keep a separate Access BE file as the particular archiving destination AND you wanted to maintain relational integrity in that file, too, then you would have to (a) COPY the invoice table to the archive, (b) copy the inventory details to the archive, (c) delete the archive details from the live DB, (d) delete the invoice (parent) records from the live DB - all the while using a restrictive WHERE clause that would only copy or delete records eligible to be archived... which means that the invoice table (the original parent) has to have a field to drive the archive step in the first place. If there is a date field and you are archiving by record's age, no biggie. Now, if there are multiple inter-table relationships that confuse this issue, you have to do the same type of analysis but with extra tables to complicate your plans. The general rule is "archive the top of the hierarchy first, remove the top of the hierarch last."

But there can be complications. In my big security-tracker DB, a complication was that each "required action" entry applied to every server on site, over 1500 total. So I had to know the state of each server for each required action. There was more to it than that, but this part illustrates an "oddball" case. I was allowed to archive "required action" entries after 6 months - but only six months after ALL of the servers had been appropriately updated. If even ONE server was still pending for action, the whole action entry was OPEN. Which meant that the date of issuance of the action notice wasn't important - it was the date of closure in the junction table that tracked individual compliance. The tables ALL had a Y/N field for "MaintSelect" (selected for maintenance) so that I could run a couple of queries to select (or de-select) eligible records, then run the appropriate export and removal queries. I.e. I built indicator fields into the tables so that when it was time to do maintenance, I could mark the records. Once the marking was done, the copying and removal were almost trivial.

IF you are actually NOT planning to use a BE file, you could do what I did for a situation where the action history would become irrelevant AND there was no need to record every single detail, I built a query to combine data and exported the query records to a spreadsheet. The removal step didn't change, but the archival step did, because I could export everything at once to the spreadsheet.

The archival step will always depend on what you choose as the storage mechanism. The removal step hardly ever seems to change, in principle. The only question in removal is whether you can use an already existing field or you need a "selected for action" field as the mechanical method of record selection.
 
An append query, followed by a delete query, both with appropriate parameters, would do the job. You can retain access to the archived data by linking it to the production database.

I am not sure, though, what you mean by "historicize" them. How is that different from moving them to the archive database?
Of course, I wasn't talking about moving data from the main database to the historical_db_1
What I wanted to know was whether there's an effective way to reduce the size of the main database while still maintaining the ability to view the historical information.
 
Why do you need to move them to another database? Data doesn't take too much space on disk. You can set a default value for date in your search form, so when users search for something, it shows the data from 2025 and later as default. One bonus is that if for some rare reasons they need to refer to old data, they simply can change the default value and search.

To reduce the size of the main database and improve its performance
"..Data doesn't take up too much space on disk.." this has no bearing on the request.
 
that will likely no longer be useful and will unnecessarily burden the archive
About 20 years ago I had a client with a similar question. The issue was one of identifying records because using something like ‘everything before a certain date’ could remove records which although old, were still being referenced.

The solution was to include a ‘last referenced date’ field which was updated every time the record was referenced - any record last referenced x number of years ago could be archived
 
As KitaYama has pointed out, there is no need to move the rows to another table. The value of an Historicized column of Boolean (Yes/No) data type could be set to True. In its normal day to day operations the database would then use those rows WHERE NOT Historicized.

The attached little demo file illustrates a similar approach to mark records as deleted. One or more records marked as deleted can be restored at any time should this be necessary. You'll see that the demo includes an option to delete multiple rows where the value in a column of date/time data type is prior to a date entered into an unbound text box in a dialogue form. This is analogous to marking rows as historicized in your case.

The demo does include an option to move a single deleted row to a backup table, as you are proposing. You could easily amend this to move multiple rows. One or more rows can be restored from the backup table at any time if necessary. Note that to avoid primary key conflicts if the database is compacted and repaired, and a backed up record is then restored, the ContactID primary key of the Contacts table in the demo is not an autonumber, but is computed when a row is inserted.

The question isn't whether it needs to be done or not
The question is "how" to archive data currently present in the main database and still be able to view the historical data
I've already considered marking some records as "deleted," but that's not what I asked.
 
My colleagues have offered a few thoughts on the subject. I'll take a different direction just to cover some bases. Let's assume for the sake of discussion that you really DO need to remove - but somehow preserve - old data.

The problem, of course, is the intertwining relationships that can exist sometimes between two or more tables. You therefore have to create a "deconvolution" plan that starts by identifying tables that depend on other tables (and why).

Simple case: An invoice table with a child table of invoice line-item details, one/many parent/child relationship, with relational integrity enabled. You can't archive the invoices first, because removing the invoice entries would violate relational integrity for the line-item table. So you would think that have to archive the CHILD table (invoice details) first. EXCEPT that you then need to consider what format you are using to store the details. Your plan needs to take the master DB and the archive destination into account.

If you are planning to keep a separate Access BE file as the particular archiving destination AND you wanted to maintain relational integrity in that file, too, then you would have to (a) COPY the invoice table to the archive, (b) copy the inventory details to the archive, (c) delete the archive details from the live DB, (d) delete the invoice (parent) records from the live DB - all the while using a restrictive WHERE clause that would only copy or delete records eligible to be archived... which means that the invoice table (the original parent) has to have a field to drive the archive step in the first place. If there is a date field and you are archiving by record's age, no biggie. Now, if there are multiple inter-table relationships that confuse this issue, you have to do the same type of analysis but with extra tables to complicate your plans. The general rule is "archive the top of the hierarchy first, remove the top of the hierarch last."

But there can be complications. In my big security-tracker DB, a complication was that each "required action" entry applied to every server on site, over 1500 total. So I had to know the state of each server for each required action. There was more to it than that, but this part illustrates an "oddball" case. I was allowed to archive "required action" entries after 6 months - but only six months after ALL of the servers had been appropriately updated. If even ONE server was still pending for action, the whole action entry was OPEN. Which meant that the date of issuance of the action notice wasn't important - it was the date of closure in the junction table that tracked individual compliance. The tables ALL had a Y/N field for "MaintSelect" (selected for maintenance) so that I could run a couple of queries to select (or de-select) eligible records, then run the appropriate export and removal queries. I.e. I built indicator fields into the tables so that when it was time to do maintenance, I could mark the records. Once the marking was done, the copying and removal were almost trivial.

IF you are actually NOT planning to use a BE file, you could do what I did for a situation where the action history would become irrelevant AND there was no need to record every single detail, I built a query to combine data and exported the query records to a spreadsheet. The removal step didn't change, but the archival step did, because I could export everything at once to the spreadsheet.

The archival step will always depend on what you choose as the storage mechanism. The removal step hardly ever seems to change, in principle. The only question in removal is whether you can use an already existing field or you need a "selected for action" field as the mechanical method of record selection.

Thanks for the reply
But in my case, the problem doesn't affect the relationships, but the data physically present in the archive.
 
Of course, I wasn't talking about moving data from the main database to the historical_db_1
What I wanted to know was whether there's an effective way to reduce the size of the main database while still maintaining the ability to view the historical information.
In your original post: "How do we historicize all documents dated before 2025 in the DOCUMENTS table and then physically move them to another database,"

One or the other is true. Either you do want to "physically move them" or you don't.
 
Yes, of course
I was clarifying that I sent an imprecise request, and that it wasn't actually the physical transfer of records from A to B that interested me, but rather the logic of moving, deleting, and then possibly recovering a series of records for viewing.
 
rather the logic of moving, deleting, and then possibly recovering a series of records for viewing.

This is the sticking point. You don't want the records, but you DO want the records.

To reduce the size of the main database and improve its performance

To reduce the size of a DB, you have the COMPACT & REPAIR action which has its limits. You can assure that everything is properly normalized, because that potentially reduces data size. But if you want recoverable data AND want to reduce the size of a compacted DB AND don't want to move it to an archive AND it already IS normalized, you have defined an over-constrained problem.

Simple principle: Data in a database takes up space. To reduce space, reduce the data. But you have ruled out the most commonly used ways of doing that.
 
This is the sticking point. You don't want the records, but you DO want the records.



To reduce the size of a DB, you have the COMPACT & REPAIR action which has its limits. You can assure that everything is properly normalized, because that potentially reduces data size. But if you want recoverable data AND want to reduce the size of a compacted DB AND don't want to move it to an archive AND it already IS normalized, you have defined an over-constrained problem.

Simple principle: Data in a database takes up space. To reduce space, reduce the data. But you have ruled out the most commonly used ways of doing that.

What do you mean by "I excluded the most commonly used methods"?
Records will need to be deleted from the main database; there's no way around it
The problem is figuring out the most effective method for storing records deemed too old in the secondary database before deleting them, and then retrieving and viewing them if necessary
 
The most effective method is to decide what information you will need to keep for viewing purposes and then removing everything else. Since we don't know the structure of your records or the specifics of your viewing needs, we can't answer that particular "how to" question.
 
My approach is to create an archive table that mirrors the structure of your original table, but change the primary key from an AutoNumber to a regular Number (Long Integer). That way you can copy the original ID values over without worrying about AutoNumber conflicts.

Also, make sure the database is split, and put the archive tables in a different back-end file. That keeps the archive data from cluttering up the main back end or helping it grow larger than it needs to. Your front end can still link to those archive tables, so the historical data remains available whenever you need to view it.

Then periodically run an archive routine. This is usually just an Append query that copies the old records, for example anything before a certain date, into the archive table, followed by a Delete query that removes those same records from the live table. Many people run this monthly, quarterly, or once a year depending on how quickly their tables grow.

If you want to see examples of how to set this up, just search YouTube for "MS Access Archive Records." There are plenty of walkthroughs showing the append-and-delete approach step by step.

LLAP
RR
 
Moving rows to a separate file should only be a method of last resort. If the size of the back end is pushing the limits of Access, it's time to move to SQL server or another client server database as the back end.
 
the logic of moving, deleting, and then possibly recovering a series of records for viewing.
The problem is figuring out the most effective method for storing records deemed too old in the secondary database before deleting them
see post #9 for an example logic to identify records that could be moved or deleted

But if deleted, there is no way to recover them
will likely no longer be useful and will unnecessarily burden the archive
If you have moved them to another archive table, use a union query to combine back with the current archive.

Must admit, your question is lacking in detail. You ask a vague question and then dismiss all the suggestions made. Previously you talk about using sql server or other rdbms's Which one you are using will have an impact on what the most effective method might be. Your example refers to documents - are these actual documents? (word/pdf/something else?)? if so, you could simply store them in a windows folder somewhere
 
CJ's question is valid and I didn't originally take that interpretation. Are you storing the actual document in the DB file or merely linking to it as an external file?
 

Users who are viewing this thread

Back
Top Bottom