Food for thought on indices (1 Viewer)

Moniker · Mar 16, 2007

This is not a question, but a short tale on what's happened in the past few days for me.

We have a ton of data where I work (several terabytes) and I work with a lot of it. For a certain report, I link into several Access DBs of varying sizes (500KB to around 500MB) and pull the data I want into a few different tables. Once the queries were written to make the data for this report, they worked, but the entire process took around five and a half hours. We have other processes that take this long, so I didn't think too much of it.

One of the processes was doing an update the main detail table, and that thing turned out to be the culprit. Without that update process, the whole thing took maybe 10 minutes, so something must be wrong with the update, right? Well, it's updating one field (out of a table of six fields) and the remaining five fields are inner joined to the source, so that shouldn't be it. It turns out to be the indices on the main detail table that were causing such massive overhead.

The way I got around it was to use some quick data definition queries to remove the indices (DROP INDEX), perform the appending, add the indices back (CREATE INDEX), and then do the updating. The whole thing went from 5 1/2 hours to 11 minutes.

If you're having a poorly performing query, I would recommend this method. Generically outlined, it looks like this:

1) Compact and repair.
2) Use SQL to delete all the content from main detail table.
3) Use SQL to remove all indices from the main detail table and whatever other tables you will eventually link to.
4) Use SQL to do whatever appends you need to whatever tables need the data (in my case, the main detail table and a subordinate).
5) Use SQL to add your indices back in.
6) Perform any updates needed (in my case update the main detail table with data from the subordinate table).

I didn't realize how much overhead the indices were adding (the table is over two million records), and doing a simple append with all those indices in place was causing a timing nightmare. So long as your subordinate and main tables share the same indices (five field index in my case), the update flies.

I'm just sharing because this is one of those things that took me (with help from my boss) two days to figure out, and it's a huge time-saver. Apparently creating the indices on existing data is 100x faster than creating them as data is appended. I didn't know that.

The_Doc_Man · Mar 18, 2007

There are many schools of thought regarding indexes. The way I have heard the rule is, if you don't need the index, don't have one.

Indexes exist to support an action. Be it a specific type of JOIN, a search query, a report, ... Look at how often you use that associated action. If it is once a month and the table is extremely long (100 MB+), build the index and drop it again when done.

Because...

When you update a table, you update it once for itself and ONCE FOR EVERY INDEX!!!

Now add to the nightmare... make the index based on a compound key.

Now REALLY ramp up the work. Make the index's cardinality (expected number of records per index value) greater than 5% of the table size.

OK, the icing on the cake. Make the key non-numeric and greater than 4 bytes long.

Moniker · Mar 22, 2007

The indices are in place with good reason (and yes, there are plenty of compound indices). The reporting based on the data happens tons faster with indices in place.

The updating, so long as it's not an indexed field, has no effect (for obvious reasons). It's the appending that's killer. The entire thing seems to get reindexed for every append. (I can't prove that, but the time difference spells it out.)

Additionally, the indices will add a lot of overhead as far as size is concerned. We had a few indices we decided we didn't really need (they were superfluous) and removing them took 50MB off the final result.

So, I'm agreeing that you need to use the indices discriminately. Yes, they can add a lot of speed to certain operations (or slow them down exponentially), but they also add a lot of size to the DB in general. Note that debugging the DB (using the "/debug" tag in a command line operation) cleans up a ton of this. It's an extra step, but if you're very concerned about space, it's definitely worth it. It's nearly an equal payoff: indices + debug = no indices + size.

gemma-the-husky · May 9, 2007

presumably access uses some sort of isam balanced tree structure for its indexes. so with a lot of records, adding one more can cause a lot of re-balncing within the table. However I also thought the structure of these types of index structures minimised the number of writes, as the see-saw effect settles down.

Having said all this, we had just the same problem with a VMS file (I know tyhere's VMS experts here!) - massive file, lots of indexes, some compound, and in the the end it started taking 5-6 secs to insert extra records, so processing the whole file started to become extremely lengthy.

The_Doc_Man · May 10, 2007

VMS??????? As in... VAX/VMS or Alpha/VMS? Or even Itanium/VMS?

If so, and you want help, continue to post in this thread.

The_Doc_Man · May 10, 2007

On a different approach, I suspect part of the problem is that indexes tend to get churned a lot for massive inserts. Churning causes repeated traversal of ever-deepening sort trees. No matter how balanced, the problem becomes a nightmare of tree traversal.

When inserting a SET of records of size N into a table of size M with a single binary tree, one insert takes log(M) insert time. Inserting the whole set takes N * Log(M+N/2) insert time (approximately) PER INDEX. BUT - Access being what it is, that's not the whole story.

After an insert of this type, you need to do one last compact & repair because every time an index buffer has to be split, Access has to allocate more disk space for each index, which leads to essentially random placement of the new index records on disk and in memory. OK, RAM is no problem. It is called RAM because it IS randomly addressable with little or no penalty. But with paging and swapping, random placement of the backing store for the disk-resident index causes data to become fragmented. Fragmented data placement increases the number of seeks required to gather a large data set. And in terms of time, SEEKS kill you worse than anything. For a system with a good disk, a seek is still 8 msec. - which on a 1 GHz machine is still 8 MILLION clock cycles or about 3.0-3.5 million instructions!!! BIG penalty.

The compact & repair is EXTREMELY important here and not only because it drops all "abandoned" memory, thus making the DB smaller. Also because it gathers indexes to make them contiguous and monotonic with respect to order of the index. Tables become contiguous though not necessarily in a specific order unless there is a true primary key on the table. The result of a compact & report is that index usage becomes more efficient again.

The method of dropping the index, doing the insert, and rebuilding the index is more efficient because while each index churns a little during the rebuild, they are churned one at a time. The action stays localized, minimizing disk seek activity. (But a compact & repair is still recommended after replacing indexes anyway.)

gemma-the-husky · May 12, 2007

doc man, just mentioning a similar issue in a large Alpha/VMS file, but not loking for help.

Performance started to degrade , despite all the benefits of RMS.

Now i've read your last post, - very thorough explanation and analysis

The_Doc_Man · May 12, 2007

Gemma, when you start getting that type of performance on an Alpha running OpenVMS, you need to do an EDIT/FDL and redesign your bucket sizes and fill ratios. Also, depending on file size, this might (MIGHT) be a case where you would like to make your bucket size and your disk's cluster size match. Yeah, normally you would not... but bucket splits eat your socks if you aren't careful.

I would also strongly consider doing an ANALYZE/RMS every so often just for sh|ts and giggles to assure you don't muck your buckets. You might then wish to do a CONVERT/FDL on the big file as the moral equivalent of the Access Compact & Repair function.

Also, do you use a disk defragger product? For really big files, defragging will help with regard to seek time reduction.

Not to mention that this might also be the time to boost the ACP_xxx quotas related to your file type. You really want to cache the hell out of the indexes for indexed files.

Then there is always the method of using SET RMS to increase the number and size of your file buffers for that mode of file access.

Food for thought on indices (1 Viewer)

Moniker

VBA Pro

The_Doc_Man

Immoderate Moderator

Moniker

VBA Pro

gemma-the-husky

Super Moderator

The_Doc_Man

Immoderate Moderator

The_Doc_Man

Immoderate Moderator

gemma-the-husky

Super Moderator

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread