Most Efficient Way to Store Historical Data

milkman2500

Registered User.
Local time
Today, 14:12
Joined
Oct 21, 2012
Messages
45
Hi,

I'm trying to brainstorm the best way to maintain historical employee information. I'm thinking of 2 different ways, but not sure how Access will handle them.

1) A table that maintains the start and stop date of the relationship (i.e. employee has a job title from a start date to an end date).

This is the ideal, but I'm concerned about the number of records. The database will store 3,000 employees and I'd estimate around 2000 changes a month can occur to the employee data (transfers, hires, promotions, terminations and all cascading changes on dependent information).

2) A different database for each month/year. (i.e. Employees_March2013, Employees_April2013)

I don't have concerns about the number of records, but I'm not sure how the front-end will work with multiple back-end databases. Is there an easy way to setup a form to choose which "effective date" of employee information you'd like to choose and have it link to the correct back-end at that point before running a query/report?

Thanks for your input.
 
With a table that grows 2000 rows per month, you have a couple of years before you need to consider upsizing. Especially if the history table isn't used except to log changes.

Always go with the normalized schema. Working with a different database/table for each month/year will be a nightmare.

At some point the historical data becomes less important and rarely used. You might want to add an archive process that runs monthly and deletes data from the history table that is older than (pick a number) 3 years and moves it to a separate database. This gives you reasonably sized history so you can use it for reporting and gives you access to archived data for reference.
 
out of interest, there should be no cascading changes

cascading ... indicates normalisation problems, generally
 
I cannot conceive of a properly designed system that needs cascading updates. The only time these might be necessary is where, say, a text value is held as a descriptor throughout a system, and when this changes, it requires propagation througout the system.

This can be avoided by using a lookup table, and storing the ID, rather than the text. Now when the text changes, there is no cascade update. Storing a numeric ID, is also far more efficient than storing text.

Cascading deletes can occur when a "base" record is deleted. Again, I prefer to avoid these by just setting a "deleted" flag.
 
Hi,
Does above indicate using pseudo autonumber id as primary key, preferably ?
Thanks
 
either autonumber, or just a number.

it is a matter of not using a text value, if that value may change.

maybe for a US state, then it doesn't make much difference between, say, the two-letter state abbreviation, and a numeric state ID - but in most cases, a numeric value is a better choice, in my opinion.
 
Cascade update is required when you use "natural" keys that can potentially change. Sometimes these keys are constructed such as a CustomerID that takes a few characters from the CustomerName and makes an abbreviation and sometimes they come from other applications such as SSN (Social Security Number). Even though by law SSN is not supposed to be used as a PK, many older systems do use it. They don't technically violate normal standards although conventional wisdom these days favors artifical keys such as autonumbers.
 
The "Thanks" button is taking me to the Reply screen????

Thanks Gemma & Pat.
 
You might want to add an archive process that runs monthly and deletes data from the history table that is older than (pick a number) 3 years and moves it to a separate database. This gives you reasonably sized history so you can use it for reporting and gives you access to archived data for reference.

I'm curious about this. Is this how you would set it up? This way you maintain all of the relationships to any other tables in the database, but keep them out of the "main" table.

tblEmployees
EmployeeID (PK)
EmployeeName
EmployeeStartDate
EmployeeEndDate
EmployeeArchiveID (FK)

tblEmployeeArchive
EmployeeArchiveID (PK)
EmployeeNameArchive
EmployeeStartDateArchive
EmployeeEndDateArchive
 
I would leave the structures the same. That will allow you to copy queries and just change the table name but not have to change all the column names.

When you archive "master" data, you copy the tables exactly, including the primary key and the archive table retains the primary key value. This has the advantage of allowing you to put the data back in the main tables should that be required.

Technically, there is no need to maintain RI on the archive tables since they are never updated via the GUI. Only new records are appended as they are deleted from the master tables.

When you do the archive, you should enclose the append and delete queries inside a transaction so that the database engine can ensure that both actions happen or they both get rolled back.
 
I would leave the structures the same. That will allow you to copy queries and just change the table name but not have to change all the column names.

When you archive "master" data, you copy the tables exactly, including the primary key and the archive table retains the primary key value. This has the advantage of allowing you to put the data back in the main tables should that be required.
Got it - make an identical copy of the main table for use as the archive table.

Technically, there is no need to maintain RI on the archive tables since they are never updated via the GUI. Only new records are appended as they are deleted from the master tables.

When you do the archive, you should enclose the append and delete queries inside a transaction so that the database engine can ensure that both actions happen or they both get rolled back.

I would use RI on the archive table to allow us to be able to report on equipment that was removed within certain time frames.

I don't know what you mean by enclosing queries inside a transaction, though.
 
what pat means, is that something sensitive like archiving a stack of data needs to be completed, and veriifed as completed.

a "transaction" is the encapsulation of the process

so archiving involves at a basic level

- adding data from one table into another (a query)
- deleting data from the base table (a query)
- maybe some other tidying up routines

if step 1 works, but not step 2, then the entire transaction fails, and you want to be able to restore the data to the original position. That is called transaction roll-back
 
if step 1 works, but not step 2, then the entire transaction fails, and you want to be able to restore the data to the original position. That is called transaction roll-back

I see; this makes sense. Is this something that is best handled with code? If so, are there examples of that floating around?
 
The code is trivial. You start the transaction and you end it. In between, you run your queries.
Code:
BeginTrans
Excute SQL_1
Excute SQL_2
.....
CommitTrans
 
The code is trivial. You start the transaction and you end it. In between, you run your queries.
Code:
BeginTrans
Excute SQL_1
Excute SQL_2
.....
CommitTrans

I'll look more into this, thanks. Pretty new at doing this.
 

Users who are viewing this thread

Back
Top Bottom