View Full Version : Indexing (fields with low entropy)


bbb4512
09-20-2007, 12:40 AM
I need an advice by indexing fields with low entropy.

I have some large tables with some columns that i use in selects, but they store only few vaues like (0 or 1) or ("R" or "N" or "F").

Somewhere i read, that fileds like these (with lov variety of values) do not need (or should not) to be indexed.

http://www.dbpd.com/vault/9801xtra.htm

Is this true also for access databases accessed via ODBC, or not?
Please give me an advice.

Marcel

Rabbie
09-20-2007, 03:57 AM
If you have a large table with an index field which can only take a few values then you will have a lot of duplicate values in your index which will reduce the value of your index. The overhead of maintaining the index will remain the same regardless of the number of duplicates so there will be a decrease in advantages together with the overheads.

I would suggest you try both with and without this field indexed and see whether the advantages outway the disadvantages or not.

bbb4512
09-20-2007, 06:48 AM
Thanks, for replying. The database is used in a quite complex environment, so it is not easy to test the pros and cons. I decided to omit those indexes, although selects will probably be slower. :-/

The_Doc_Man
09-21-2007, 07:25 AM
Indexes are most useful in inverse relationship to their cardinality. For a unique field, cardinality is 1. For the ultimate far end, say a Y/N field (2 values), cardinality is the number of records in the table /2.

cardinality, n - a property of a field or a group of fields; by extension, a property of the index based on the selected field(s); equal to the number of records expected to be returned from a query based on an arbitrary but valid value. By implication for multi-field queries, the cardinality depends on the selected value for each field being atomic rather than a list.

For non-uniform value distributions, a QUICK AND DIRTY calculation of cardinality is the number of records in the whole table divided by the number of values that can appear in the field in question. NOTE that this is a field property, NOT a table property. It is more complex for a multi-field index but the concept is the same.

Where is the cutoff of usabilty? Damfino! Clearly, the smaller the cardinality, the more effective the index. The higher the cardinality, the worse its effectiveness.

A better way to speed up such searches is to keep the fields in question SHORT. Like, if you have Values SMALL, MEDIUM, and LARGE, use S, M, and L as your codes and when you need to spell them out, translate them via a lookup via JOIN query or a LOOKUP field in a report or form. (Don't use LOOKUP fields in tables; they all too often slow you down a lot.)

The shorter your fields, the more record you can fit in a disk buffer. The more you fit in a single buffer, the faster your search will run, regardless of table size.