Indexes (1 Viewer)

WindSailor · Mar 5, 2006

Non Clustered Indexes

There is not a whole lot of information on creating / using Indexes for Access.

I believe Access does not have clustered indexes (A clustered index physically orders the data in a table based on a single or composite column - I don’t think ‘Autonumber’ would apply) and when building my tables trying to abide by the ‘Normal Forms’ guidelines I have always put the most queried column first, second column next etc. and basically used indexes in the same manner.

Now doing some digging on ‘indexes’, I found the following suggestion regarding ‘non clustered’ indexes:

“Always design your indexes with the most sparse columns first and the least sparse columns last, e.g, Name + Province + Sex.”

Interesting… because originally I thought that would be backwards, evidently not true.

Here are some tips regarding indexes, but these are geared for MS SQL Server, not Access (although I would apply whatever I could) from here:

http://www.mssqlcity.com/Articles/Tuning/IndexOptimTips.htm

• Every index increases the time in takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not be very much. Try to use maximum 4-5 indexes on one table, not more. If you have read-only table, then the number of indexes may be increased.
• Keep your indexes as narrow as possible. This reduces the size of the index and reduces the number of reads required to read the index.
• Try to create indexes on columns that have integer values rather than character values.
• If you create a composite (multi-column) index, the order of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.
• If you want to join several tables, try to create surrogate (substitute or replacement) integer keys for this purpose and create indexes on their columns.
• Create surrogate integer primary key (identity for example) if your table will not have many insert operations.
• Clustered indexes are more preferable than nonclustered, if you need to select by a range of values or you need to sort results set with GROUP BY or ORDER BY.
• If your application will be performing the same query over and over on the same table, consider creating a covering index on the table.

*A covering index is an index, which includes all of the columns referenced in the query. So the index contains the data you are looking for and SQL Server does not have to look up the actual data in the table.

Can anyone expand on this?

Edit----
And would taking your database table structure to a higher normal form (less indexes per table) increase the speed, or does the cumulative indexes count as a whole (I am assuming this is true)?

Edit----
I changed the title to 'Non Clustered Indexes' because I thought Indexes should be split into two different catagories: Clustered and Non Clustered, I also thought it would change the title of this thread in this forum, it didn't, sorry.

Thanks

Pat Hartman · Mar 7, 2006

I believe Access does not have clustered indexes

While technically true, Access rewrites each table in PK order when the database is compacted so you have the effect of a clustered index on the pk. I'm not really sure what benefit you get from this when using an autonumber pk but it seems it would be more useful when using a compound natural key.

WindSailor · Mar 11, 2006

Thanks Pat,

Here is more information on Non-clustered indexes...

http://www.sql-server-performance.com/nonclustered_indexes.asp

Non-clustered indexes are best for queries:
• That return few rows (including just one row) and where the index has good selectivity (generally above 95%).
• That retrieve small ranges of data (not large ranges). Clustered indexes perform better for large range queries.
• Where both the WHERE clause and the ORDER BY clause are both specified for the same column in a query. This way, the non-clustered index pulls double duty. It helps to speed up accessing the records, and it also speeds up the sorting of the records (because the returned data is already sorted).
• That use JOINs (although clustered indexes are better).
• When the column or columns to be indexed are very wide. While wide indexes are never a good thing, if you have no choice, a non-clustered index will have overall less overhead than a clustered index on a wide index.
[6.5, 7.0, 2000] Updated 1-6-2006

*****
If a column in a table is not at least 95% unique, then most likely the query optimizer will not use a non-clustered index based on that column. Because of this, don't add non-clustered indexes to columns that aren't at least 95% unique. For example, a column with "yes" or "no" as the data won't be at least 95% unique. [6.5, 7.0, 2000] Updated 1-6-2006

*****
If your table needs a clustered index, be sure it is added to the table before you add any non-clustered indexes. If you don't, when you add a clustered index to your table, all of the pre-existing non-clustered indexes will have to be rebuilt (which is done automatically when the clustered index is built), putting an unnecessary strain on your server. [6.5, 7.0, 2000] Updated 1-6-2006

*****
To determine the selectivity on an index on a given table, run this command: DBCC SHOW_STATISTICS (table_name, index_name). The higher the selectivity of an index, the greater the likelihood it will be used by the query optimizer. [6.5, 7.0, 2000] Updated 1-6-2006

*****
When deciding whether or not to add a non-clustered index to a column of a table, it is useful to first find out how selective it is. By this, what we want to know is the ratio of unique rows to total rows (based on a specific column) found in the table. Generally, if a column is not more than 95% unique, then the Query Optimizer might not even use the index. If this is the case, then adding the non-clustered index may be a waste of disk space. If fact, adding a non-clustered index that is never used will hurt a table's performance.
Another useful reason to determine the selectivity of a column is to decide what is the best order to position indexes in a composite index. This is because you will get the best performance out of a composite index if the columns are arranged so that the most selective is the first one, the next most selective, the second one, and so on.
So how do you determine the selectivity of a column? One way is to run the following script on any column you are considering for a non-clustered index. This example script is designed to be used with the Northwind database, so you will need to modify it appropriately for your use.

--Finds the Degree of Selectivity for a Specific Column in a Row
Declare @total_unique float
Declare @total_rows float
Declare @selectivity_ratio float

SELECT @total_unique = 0
SELECT @total_rows = 0
SELECT @selectivity_ratio = 0

--Finds the Total Number of Unique Rows in a Table
--Be sure to replace OrderID below with the name of your column
--Be sure to replace [Order Details] below with your table name
SELECT @total_unique = (SELECT COUNT(DISTINCT OrderID) FROM [Order Details])

--Calculates Total Number of Rows in Table
--Be sure to replace [Order Details] below with your table name
SELECT @total_rows = (SELECT COUNT(*) FROM [Order Details])

--Calculates Selectivity Ratio for a Specific Column
SELECT @selectivity_ratio = ROUND((SELECT @total_unique/@total_rows),2,2)
SELECT @selectivity_ratio as 'Selectivity Ratio'

The results in this case is 38%, which means that adding a non-clustered index to the OrderID column of the Order Details table in the Northwind database is probably not a very good idea. [6.5, 7.0, 2000] Updated 1-6-2006

*****
In some cases, even though a column (or columns if a composite index) has a non-clustered index, the Query Optimizer may not use it (even though it should), instead performing a table scan (if the table is a heap) or a clustered index scan (if there is a clustered index). This, of course, can produce unwanted performance problems.
This particular problem can occur when there is a data correlation between the order of the rows in the table, and the order of the non-clustered index entries. This can occur when there is correlation between the clustered index and the non-clustered index. For example, the clustered index may be created on a date column, and the non-clustered index might be created on an invoice number column. If this is the case, then there is a correlation (or direct relationship) between the increasing dates and the increasing invoice numbers found in each row.
The reason this problem occurs is because the Query Optimizer assumes there is no correlation, and it makes its optimization decisions based on this assumption.
If you run into this problem, there are three potential resolutions to this problem:
• If possible, reorder the non-clustered index column (assuming a composite index) so that the column with the highest cardinality is the first column in the composite index.
• Create covering indexes.
• Add index hints to your queries to overrule the Query Optimizer.
[7.0, 2000] Updated 11-1-2005 More from Microsoft

*****
When you think of page splits, you normally only think of clustered indexes. This is because clustered indexes enforce the physical order of the index, and page splitting can be a problem if the clustered index is based on a non-incrementing column. But what has this to do with non-clustered indexes? While non-clustered indexes use a clustered index (assuming the table is not a heap) as their key, most people don't realize that non-clustered indexes can suffer from page splitting, and because of this, need to have an appropriate fillfactor and pad_index set for them.
Here's an example of how non-clustered indexes can experience page splits. Let's say you have a table that has a clustered index on it, such as customer number. Let's also say that you have a non-clustered index on the zip code column. As you can quite well imagine, the data in the zip code column will have no relation to the customer number and will be more or less random, and data will have to be inserted into the zip code index randomly. Like clustered index pages, non-clustered index pages can experience page splitting.
So just as with clustered indexes, non-clustered indexes need to have an appropriate fillfactor and pad_index, and also be rebuilt on a periodic basis. [7.0, 2000, 2005] Updated 11-1-2005

WindSailor · Aug 24, 2006

http://www.sql-server-performance.com/am_indexing_low_sel_cols.asp

-------------------------------------

Techniques for Indexing Low-Selectivity Columns in SQL Server

By Merrill Aldrich
9 June 2006

Most of us have probably faced this situation at one time or another: there's a deceptively simple query in your system that's performing poorly:

SELECT col1, col2, col3 FROM aLargeTable WHERE flag = 1

On the surface, this seems like an easy thing to solve. Looking closer, though, it's likely that our offending "flag" column consists of only two values, and is a BIT column — or, what amounts to the same thing, a column of another type containing only a few different values. That means that indexing it in the usual fashion will not work, because the index will not be selective enough to provide the database engine with an advantage when selecting the rows and the optimizer is not likely to even use it. This, incidentally, would be correct behavior, because using the index to look up many individual rows would be slower than just scanning the whole table.

So, what to do? There are a few possible actions here, but some will help and some will not. What follows is an analysis of some techniques, with their performance impact, using SQL Server 2005.

-----------------------------------------------

Go to the link above to view the main body of the article (too big to post in its entirety here).

-----------------------------------------------

Adeptus · Aug 24, 2006

WindSailor said:
“Always design your indexes with the most sparse columns first and the least sparse columns last, e.g, Name + Province + Sex.”

That makes sense to me, though their choice of words is a little obscure.

Looking at their example, "Most sparse" = least matches per index value
"Least sparse" = most matches per index value

Indexes (1 Viewer)

WindSailor

Registered User.

Pat Hartman

Super Moderator

WindSailor

Registered User.

WindSailor

Registered User.

Adeptus

What's this button do?

Similar threads

Users who are viewing this thread