Why indexing is important for good performance

isladogs · Jun 9, 2018

@Doc

Access has a limit of 10 indexes per table (and a PK counts as one).

Whilst agreeing with the general idea behind your post, that quote isn't quite correct. These appear to be the correct values:
Number of indexes in a table 32
Number of fields in an index 10

See:
https://support.office.com/en-us/article/access-specifications-0cf3c66f-9cf2-4e32-9568-98c1025bb47c
https://www.databasezone.com/techdocs/acclimit.html

EDIT:
The following is also an illustration of when indexing does not help.

I recently did a test on another forum to compare the time taken to update a large number of records (2.6 million) where around 20% of the records were null
The comparison was between using Len(fieldname)=0 and Nz(fieldname,"")="" to check for nulls.
The prediction by a forum user was that Len would be faster as VBA handles number more efficiently than strings
That was not the case. Nz was about 10% faster IN THIS TEST

I also decided to compare the times taken when the field was indexed compared to unindexed.
Indexing INCREASED the time to do the update from around 40s to 60s

So whilst indexing may speed up SEARCHES significantly, UPDATES will in many cases be MUCH slower

The_Doc_Man · Jun 10, 2018

Used to be 10. If it is 32 now, even better - but still a limited resource and still has overhead when you have a lot of records AND a lot of indexes.

isladogs · Jun 10, 2018

Yes I agree. In fact I'd just finished adding an example to illustrate the same point

CJ_London · Jun 10, 2018

If the answer is "50% of the number of records" then the answer is "Don't do it."

To go back to my original thread - I did say this

There is also little point in indexing fields which have few distinct values (like Booleans) or contain a lot of nulls because the index itself will not be that efficient (although with Access you can set the index to ignore nulls). Go back to library analogy - how useful is it to have pages of 'Yes' followed by pages of 'No'?

one of the other things to improve index performance when relevant is to ignore nulls. So if you have a table with a field populated say 25% of the time, the rest of the time it is null - ignore them. index update performance is improved because the null index 'record' is not created and the index 'table' is smaller.

@NG

One of my main tables has 18k records and it takes over 20 min to get the recordcount you mentioned (PK is Autonumber). However, I can open a form that uses a query that returns ALL of those records with additional fields from related tables and a subform - and the recordcount is instantaneous.

In speculation mode so I can't offer an explanation except that opening a query potentially will be making use of an index whilst opening a table will not. There are better people than me who can provide a more considered answer.

Having said that I have noticed that if a table(def) is saved with sort and/or filter properties populated (together with order on load/filter on load set to yes) it can take longer to open. Not sure if this translates across to sql server but a query accessing an access table would ignore those properties.

The standard advice is that forms/reports should use a query as a recordsource, not a table - even if the query returns the entire table. I've not really considered it but it may be this advice is based on the above theory.

Suggest two experiments. Open the query as a query, not as a form recordsource. Open a form with a table as a recordsource. Compare the results.

The other experiment is on your existing form, once opened, click on the 'go to last record' option. Does it go straight there? It may be the ODBC driver is returning a recordcount at the time the recordset is opened - as happens with ADO - to quote for ADO here
https://docs.microsoft.com/en-us/sq...recordcount-property-ado?view=sql-server-2017
Remarks

[FONT=segoe-ui_normal]Use the RecordCount property to find out how many records are in a Recordset object. The property returns -1 when ADO cannot determine the number of records or if the provider or cursor type does not support RecordCount. Reading the RecordCount property on a closed Recordset causes an error.[/FONT]
Bookmarks or approximate positioning

[FONT=segoe-ui_normal]If the Recordset object does support either bookmarks or approximate positioning, this property returns the exact number of records in the Recordset. This property returns the exact number regardless of whether the Recordset has been fully populated.[/FONT]

jleach · Jun 10, 2018

I wish this were a SQL Server thread, because I know so much more about it there than how JET/ACE handles it.

In any case, if anyone's curious, I thought I'd throw out a mention to the book SQL Performance Explained by Marcus Winand. It's not for Access (in fact, not for any specific DB, though he gives examples in MSSQL, PostrgreSQL, MySQL and Oracle), but covers pretty much how every RDBMS engine works, and Access is close enough for it to be very helpful.

It's definitely in the top list enlightening technical books I've ever read.

While I prefer the hardcopy myself, Marcus does have a website (https://use-the-index-luke.com/ which does contain all of the book content online).

Good stuff, thanks for the post CJ_London.

CJ_London · Jun 10, 2018

still has overhead when you have a lot of records AND a lot of indexes

Obviously requires more effort but on occasion with a very large number of inserts or updates I've had to write code to remove indexing, do the insert/update then reapply indexing.

CJ_London · Jun 10, 2018

Thanks Jack - good link as well.

Interested in the TOP N part of the publication. pipeline is not available in access but I have found that in order to find the TOP N, the entire dataset (i.e. after criteria has been applied) needs to be sorted on some basis to determine which records are the top 10 (otherwise all you are returning is the 'first' 10) - which can be slow with even medium sized datasets.

I did have a thought, but not had the opportunity to try it yet about using the partition function to split say sales into groups (or perhaps use an aggregate query which doesn't aggregate but groups on a similar basis to partition) and using a criteria on that to limit the dataset further before applying TOP N

Edit: I'm talking something a bit more sophisticated than simple >=somevalue criteria

jleach · Jun 10, 2018

CJ_London said:
...about using the partition function to split say sales into groups (or perhaps use an aggregate query which doesn't aggregate but groups on a similar basis to partition) and using a criteria on that to limit the dataset further before applying TOP N...

I often find it very difficult to know how the engine will process this, and deciphering execution plans was never a strong point of mine, so that doesn't help either.

I recall years ago in Access saving one query to use as the "base" and querying another on top of that, only to run into trouble with the order. I had assumed that the "base" query would be calculated first, then the "secondary" query applied to the base results... turns out that's not the case and the engine (any of them, AFIAK) will take great liberties in looking at the entire mess and working out what it thinks is the best way. In this case the "base" query was supposed to filter out nulls which would have tripped up the second query, but I kept running into null processing errors in the second query. Anyway, that when I learned that sometimes dumping something into a temp table really IS the way to go.

These days I work with SQL Server, which has much better tooling for this stuff (though I suspect PARTITION functions and window functions in general are in the category of "use wisely for performance considerations").

I should dig that book out again, haven't been through it in a while and it's one of those that I can probably re-read once a year for the next ten years and still pick up nice tidbits each time.

Cheers,

The_Doc_Man · Jun 10, 2018

CJ_London said:
The standard advice is that forms/reports should use a query as a recordsource, not a table - even if the query returns the entire table. I've not really considered it but it may be this advice is based on the above theory.

As a matter of fact, I ran into another reason why this was a good idea. I never did track down WHY it was this way, but when I had a split DB with UNC links rather than drive-letter links to the BE, and I had a form that opened a table as its record source, I kept on getting a pop-up box asking for the location of the BE tables. Happened once per form launch. When I built a single-table, returns-all-fields, no-filter query and drove the form from the query, everything was quiet as a mouse, smooth as silk, not even a hiccup. I tried to find out why but never found a clear cause. I eventually gave up and yielded to the inevitable.

CJ_London · Jun 10, 2018

Using filters requires bringing the entire server side table or query recordset down from the server to memory on the local PC where Access manages the filtering.

I think this is where the access templates lead newbies in the wrong direction. 9 time out of 10, the template forms are based on tables - and of course the subform construct relies on filtering when you change the mainform record.

Shame Access doesn't have a 'mode' property to allow users to make a choice - simplistically 'professional' - tables can only be viewed via query, forms/reports can only have a query recordsource, field names do not have spaces or use reserved words, etc, 'just messing' - do what you like, but expect plenty of issues along the way.

in professional mode the query builder could be improved to allow a greater variety of joins (although I suppose the argument would then be don't use the query builder)

isladogs · Jun 10, 2018

CJ_London said:
Shame Access doesn't have a 'mode' property to allow users to make a choice - simplistically 'professional' - tables can only be viewed via query, forms/reports can only have a query recordsource, field names do not have spaces or use reserved words, etc, 'just messing' - do what you like, but expect plenty of issues along the way.

in professional mode the query builder could be improved to allow a greater variety of joins (although I suppose the argument would then be don't use the query builder)

What a great idea. One for the Access User Voice?
Mind you, I expect chances of being implemented to be close to zero as that would indicate that the current system is flawed

NauticalGent · Jun 10, 2018

Using filters requires bringing the entire server side table or query recordset down from the server to memory on the local PC where Access manages the filtering

A point I took to heart when we migrated to SQL. However, the clients/users were used to clicking the “filtered” indicator in the record selector portion of my the split form (not my decision) to toggle from open records (about 600) to all records (18k, about 4 years of history) to make research easier. I had built different views on the server side just in case the performance became an issue, but much to my surprise (and their pleasure; they HATE change) the performance hit was almost imperceptible...even at our remotest location, Bahrain. Pretty impressive considering the server is in Naples, Italy.

As much as it bothers me that our application isn’t fully optimized, I am thankful that Access and SQL Server work so well together.

jleach · Jun 10, 2018

ridders said:
What a great idea. One for the Access User Voice?
Mind you, I expect chances of being implemented to be close to zero as that would indicate that the current system is flawed

We fought for years to have them supply basic templates that followed some sort of best practices (which I don't know if they ever did, actually), let alone actually enforce anything within the program itself.

Nice thought, and I'd give it a vote, but I'd be less surprised by a talking penguin, I think, if they actually did it.

Frothingslosh · Jun 12, 2018

CJ_London said:
I think this is where the access templates lead newbies in the wrong direction. 9 time out of 10, the template forms are based on tables - and of course the subform construct relies on filtering when you change the mainform record.

Yep, there's a reason that in my current apps, the subforms always resolve to a single record based on the value in the parent form, rather than using the Parent/Child links.

When you're doing this on a table with 1.6 billion records, the normal filtering mechanic just isn't remotely acceptable.

isladogs · Jun 12, 2018

1.6 billion!?! US billion (10^9) or UK billion (10^12)
Either way that's a lot - presumably a SQL database as I'm sure it would be more than 2GB

In one of my apps I have a table of 2.6 million postcodes
Searching is obviously impossible using a combo box so to make it manageable, I break it down using 5 cascading combos, each of which are fast:

Do you have a different solution?

Frothingslosh · Jun 13, 2018

Yeah, 1.6 billion. American billion, thank God. And we add a couple million more every week. That table alone is somewhere on the far side of 1 TB in size.

Keep in mind that I work for one of the largest health insurance organizations in the US. My particular subset only handles Michigan, and we still have 4.5 million members in Michigan and 1.5 million more in other states. For the record, I actually use a view covering all claim lines* received in 2010-2012, 2012-2014, and 2014 to the current day (which is the 1.6 billion record table - the others are smaller).

*A claim line is a single action on an insurance claim. If you go to the doctor here, the claim is your visit, and the lines could include your service fee for walking in the door, 3 lines for different blood tests, a line for an x-ray, and a line for a treatment provided on-site. If you wind up in the hospital for something complicated and stay there for several weeks, the resulting claim could be THOUSANDS of lines. When I was hospitalized for two days for simple Cellulitis, my claim was still over 50 lines.

I assume your screenshot isn't the normal entry, but instead is used to find a postal code if you don't already know it. Given that constraint, I'd have probably done the exact same thing (except that I don't know how to add the Google Maps shot showing the location). Speed-wise, I don't think you'll find a faster solution than you used.

Another option I found on this very site (Paul Brown, maybe? Maybe Mr. Baldy?) was to have a combo box that doesn't actually pull a list until a certain number of characters are entered, in order to cut down on the recordset size. I'm not entirely sure how even that would perform against millions of records, though, and I really doubt it would match the performance of what you showed here.

isladogs · Jun 13, 2018

That's certainly a big file....
The postcodes BE file is 1.6GB with almost all of that in one table but it's only updated each quarter.

The normal postcode entry is a textbox.
The builder is indeed when you aren't sure what to enter.
Each combo is based on a separate table filtered for the previous selection which makes it very fast.

Allen Browne had a combo that behaved as you described. I tried the idea for this setup but found it utterly unusable

Frothingslosh · Jun 13, 2018

We have a dedicated virtual server with something around 12 or 15 TB of space. That table and its archives are far and away the biggest tables we have, but they're also the core of everything my department does. We update it every Monday night, and it feeds something like 20 different in-house applications.

And it doesn't even include prescriptions! That's another table entirely.

(In a nutshell, my department tracks down over and under payments and gets them corrected, as well as dealing with the administrative nightmare that is Medicare and determining whether Medicare pays for something or we do. Last year we actually recovered $87 million that had been over- or incorrectly-paid.

I had a feeling that your postal code table would break that dynamic combo box.

CJ_London · Jun 13, 2018

Another option I found on this very site (Paul Brown, maybe? Maybe Mr. Baldy?) was to have a combo box that doesn't actually pull a list until a certain number of characters are entered,

its a technique I use - requirement is to enter 3 chars (or wait 1/2 second on the first 2) - and I don't use an initial * - I train the user to use it if required.

You can also enter multiple entries separated by a semi colon. i.e. an entry of smi;bro will return all names beginning with smi or bro.

simplistically/statistically even entering one character reduces the returned recordset by 96%

I have a client with circa 3m records - time to return a recordset is generally less than a second but is based on a simple search of one field or another.

Galaxiom · Jun 13, 2018

ridders said:
1.6 billion!?! US billion (10^9) or UK billion (10^12)

Bit behind the times there Colin. UK adopted the US terminology in 1975.

Why indexing is important for good performance

Access MVP / VIP

Immoderate Moderator

Access MVP / VIP

Super Moderator

Registered User.

Super Moderator

Super Moderator

Registered User.

Immoderate Moderator

Super Moderator

Access MVP / VIP

Ignore List Poster Boy

Registered User.

Premier Pale Stale Ale

Access MVP / VIP

Attachments

Premier Pale Stale Ale

Access MVP / VIP

Premier Pale Stale Ale

Super Moderator

Super Moderator

Similar threads

Users who are viewing this thread