Time to Archive?

April15Hater

Accountant
Local time
Today, 06:17
Joined
Sep 12, 2008
Messages
349
I'm considering implementing an archiving system in my database, but before I go down that path, I wanted to run my situation by you guys and get some input. I am beginning to notice performance issues, but my 2 biggest tables have 12,751 records for the main and 38,048 for the detail (m-m relationship). Now I know this isn't a lot of records but the more records this baby accumulates the slower it is getting. The tables store information that will rarely need to be accessed after 4 months.

That being said, many of my queries do require a date to be filtered which, after minimal testing, I noticed is slower than if I were to just query a Primary ID integer. I don't know that I can go without filtering by date, so would it even be wise to archive in this case, or is there a better way to go about this?

Thanks,

Joe
 
It's hard to say without seeing your database and how it works.

However, I can tell you it's possible to have a database with four millions rows and very fast performance... despite the fact that composite key is used. Here's one place to get it.

Are you sure you have indices for the columns you commonly search on.

While archiving records that won't be visited but must simply be held on for whatever will indeed speed up the performance and I have no qualm recommending doing so especially if you won't need the records or provide a means to open the archive .mdb when needed. However, we should be sure to not miss out the obvious fixes such as indices or effective query statement which will make more difference in maintaining your application.
 
I figured that you'd need to see how it works. I meant for it to be more of a general question. The db is a huge animal, and I don't think I would even be able to post it.

Indices....? I've never heard of such a thing. We may be on to something here... Where can I learn more?
 
See if this gets you started.


Basically you would want to look for columns where you do a lot of filtering, searching, or deal with especially in clauses such as WHERE and ORDER BY, and maybe GROUP BY & HAVING as well and create index over those columns. See if this helps.

Another approach is to make use of already existing index. A common mistake I've seen is do something like this:

Code:
SELECT * 
FROM aTable
INNER JOIN aLookupTable
ON aTable.LookupID = aLookupTable.LookupID
WHERE aLookupTable.Value = "SomeValue";

Most of time we already have a value from say, user's selection of a combobox which also contains the ID of the lookup table so the above query is unnecessary and can be simplified to:

Code:
SELECT *
FROM aTable
WHERE aTable.LookupID = XX

Does that help?
 
See if this helps:

* Compact your database

When you compact your database you can speed up queries. When you compact your database, the records of the table are reorganized so that the records reside in adjacent database pages that are ordered by the primary key of the table. This improves the performance of the sequential scans of records in the table because only the minimum number of database pages now have to be read to retrieve the records that you want. After you compact your database, run each query to compile the query so that each query will now have the updated table statistics.


** Index a field

Index any field that is used to set criteria for the query fields and the index fields on both sides of a join. Or, create a relationship between these fields. When you create a relationship with enforced referential integrity, the Jet database engine creates an index on the foreign key if one does not already exist. Otherwise, the Jet database engine uses the existing index.

Note The Jet database engine automatically optimizes a query that joins an Access table on your hard disk and an ODBC server table if the Access table is small and if the joined fields are indexed. In this case, Access improves performance by requesting only the required records from the server. Make sure that tables you join from different sources are indexed on the join fields.


Also see:

Microsoft Access Performance FAQ
 
Help?!? Wow, this thread is like insight overload!

I know it was air code, but I'm assuming that you meant it can be simplified to
Code:
SELECT *
FROM aTable
WHERE aTable.LookupID = XX and aLookupTable.Value = "SomeValue";

Or am I missing something?

Banana, that really helps a lot!! I'm going to start doing some research now and see what I come up with...

Thanks!!

Joe
 
HiTech, I do compact as often as possible. I think inexing is where my problem lies...

Just a quick question too, does indexing work with ADODB recordsets as well?
 
HiTech, I do compact as often as possible. I think inexing is where my problem lies...

Just a quick question too, does indexing work with ADODB recordsets as well?

When JET tries to optimize your query, it will look for an index that will help. You can manually specify the index to be used.

I use DAO with an Access back end. I have never tested to see if when JET gets the ADODB recordset request, what is done with selecting the index.
 
Last edited:
Help?!? Wow, this thread is like insight overload!

I know it was air code, but I'm assuming that you meant it can be simplified to
Code:
SELECT *
FROM aTable
WHERE aTable.LookupID = XX and aLookupTable.Value = "SomeValue";

Or am I missing something?

Banana, that really helps a lot!! I'm going to start doing some research now and see what I come up with...

Thanks!!

Joe

The table would be not structured correctly if it had both ID and value from a lookup table, and even if it did have both columns, would be redundant and thus less efficient. Using the keys for criteria as much as possible is one surefire way to improve the efficiency. Of course there are times where it's not possible or feasible, but when we already are using comboboxes or listboxes, we've already done the work of loading the keys so there's no point in having Jet doing the work all over to figure which value belongs to what key and pull only those records.

To illustrate:

tblFruitBaskets

BasketID FruitTypeID Quantity
1 1 10
1 2 5
1 3 15
2 1 10

tblFruitTypes

FruitTypeID FruitType
1 Apple
2 Orange
3 Grape


Instead of this:
Code:
SELECT * 
FROM tblBaskets
INNER JOIN tblFruitTypes
ON tblBaskets.FruitTypeID = tblFruitTypes.FruitTypeID
WHERE FruitType = 'Apple'

We can just do this:
Code:
SELECT *
FROM tblBaskets
WHERE FruitTypeID = 1

Both would produce identical results but the latter is efficient because it doesn't have to do the join at all and just uses the values already present in the same table.

Did that help?
 
Ohhhhhh, I get what you are saying.... So basically search for the composite ID contained in the actual table being queried, rather than join another table and search a value which will ultimately reference the primary ID of the joined table which is the compositeID of the original table anyways... I don't know if that makes sense in words, but I got what you're saying.

HiTech...U know I love me some bed time stories :)

By the way, I did some indexing and asked the users if they could notice a difference, and I was told it's like night and day.

So once again, you've hit the nail on the head. Thanks Guys

Joe
 
Joe, here are some simple guidelines.

Access will do its darnedest to find what you ask for. But you can help it by telling it what you believe it will ask for most often.

An index is simply a separate table that contains a sorted list of all key values and pointers to the records having those key values. The key values are the values of the field that was indexed.

Access puts a limit of 10 indexes per table. There is a reason for that. Access is a small-business DB that can help you make transitions to larger systems later. But start small, it's easier. So, having that limit of 10 indexes, you might be tempted to index every field if you have only 10 fields. But that's wrong. Here's why.

Sometimes you so rarely search in a given field that it isn't worth the space that the index requires, and for a large table, you can expect a large index.

Sometimes the field is too long for the index to be practical. Like, say, a 200-character text field. The key part of the index has to be 200 characters per pointer, too - so you just chewed up a lot of space.

Sometimes you have a compound index. If so, Access will use the compound to speed up its search even if you aren't searching on the whole compound key.

And sometimes the cardinality is wrong. (Whoops... what's that word?) Cardinality is a measure of how much you expect to get back from an average query. There are some guesses, but the fastest guess is also the simplest.

Suppose you have a unique identifier as a field in a table. One query = one record, cardinality = 1. If you search it often, it is a good candidate for an index. If it is truly unique, might also be a good candidate for being the prime key - if it is short enough.

Suppose that you have a field, Gender. M or F. Cardinality is going to be half the table size, whatever it is. If the table has 38K records, cardinality would be 19K on average.

Suppose that you have a field for an employee table, the job-description code. Which links as a foreign key to a separate job-description table. But in the employee table, let's say you have no more than about 50 job descriptions in the company. If you have 5000 employees, cardinality would be 100. And if you query on job descriptions often, that has the potential to reduce your search area down to 2% of the whole table. A job code might fit in an integer, surely in no more than a LONG. So that's actually quite short as a key.

So the goal is to balance what it gets and what it costs you. Really big key? Too costly. Infrequent search? Too costly. Another key exists that is at least helpful? Then a new separate key is too costly. Doesn't reduce your search that much? Might not be worth the effort. What is left? The fields that might benefit from having an index.

But there is one more cost to consider. Every time you add an index, you are adding work for Access to maintain that index. Add a record? Every index must be updated. Delete a record? Every index must be updated. Modify a record? ... you get the picture.

That's why database design is considered an art as well as a science.

My rule of thumb is that I will have a prime key. (PERIOD. ALWAYS. ... with a very narrow list of exceptions for temporary-lifetime tables, perhaps.) Then I will make a key and index on any other fields that are my primary search interests. But I will not make a key and index out of a field that is itself a foreign key.

The performance that you get from such a design will be astounding if you didn't have keys before.
 
if however your inital query (table view) includes a function that cannot be built into an index - then each time you execute that function you will need to evaluate every item in the record set. This is certainly one case in which

a) you might struggle to find an appropriate candidate key and therefore
b) judicious archiving IS necessary
 
Gemma, you are right, but I was limiting my comments to TABLES.

Queries use existing keys whenever possible. For the "Query column is a function" case, if the field driving the function is indexed, I won't guarantee you that you get a lot of benefits, but you might get some.

Where you have to struggle a long time to find a candidate key, perhaps it is time to build a surrogate (meaningless) key such as an autonumber and just go on about your business.

Having lots of idexes & keys also argues MORE for maintaining a good archving schedule, since indexes grow when the table grows.

Joe, Gemma's post also reminded me of another little tidbit. If you have indexed your table with multiple indexes and you are about to do a really massive reorganization of the table, reducing the number of indexes on the table is not a bad idea.

Like, say you have a date field that is indexed - along with maybe six other non-date fields that are also going to be indexed - and you are planning to add or delete at least 10% of your table. If the add/delete operation ONLY depends on dates and nothing else, remove the other six indexes. Do the add/delete. Then reassert the indexes. It is faster to rebuild indexes from scratch than it is to update them a gazillion times when you know you are doing a bulk operation that will clearly scramble and churn your indexes all over the place.

If you have a table of 100,000 records and are updating 10 or so, leave indexes in place. If you have a table of 100,000 records and you are updating 10,000 or so, reduce your indexes. The break-even point depends on how many indexes we are talking about and how many records you are going to churn.

When you are done churning and re-indexing, before you allow yourself to leave, do one last compact-and-repair operation.

It should also go without saying (but when do I ever go without saying the obvious) that before you do ANY PART of the massive table action, take a backup copy of your DB to some other disk or burn a CD-ROM with it or something where you can get it back with a simple operation. And if your compact-and-repair at the end of the action is successful, make a backup copy of THAT DB as well. (Can you tell that I tend to be cautious?)
 

Users who are viewing this thread

Back
Top Bottom