Clustered Indexes (1 Viewer)

WindSailor · Mar 11, 2006

The following information is from this site:

http://www.sql-server-performance.com/clustered_indexes.asp

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases--such as an identity column, or some other column where the value is increasing--and is unique. In many cases, the primary key is the ideal column for a clustered index.

*****
Here are some more good reasons to add a clustered index to every table.
Keep in mind that a clustered index physically orders the data in a table based on a single or composite column. Second, the data in a heap (a table without a clustered index) is not stored in any particular physical order.
Whenever you need to query the column or columns used for the clustered index, SQL Server has the ability to sequentially read the data in a clustered index an extent (8 data pages, or 64K) at a time. This makes it very easy for the disk subsystem to read the data quickly from disk, especially if there is a lot of data to be retrieved.
But if a heap is used, even if you add a non-clustered index on an appropriate column or columns, because the data is not physically ordered (unless you are using a covering index), SQL Server has to read the data from disk randomly using 8K pages. This creates a lot of extra work for the disk subsystem to retrieve the same data, hurting performance.
Another disadvantage of a heap is that when you rebuild indexes to reduce fragmentation, heaps are not defragmented, because they are not indexes. This means that over time, a heap will become more and more fragmented, further hurting performance. Adding a cluster index will insure that the table can be defragmented when indexes are rebuilt.
This are just several of many reasons why a clustered index should be added to virtually all tables. [6.5, 7.0, 2000] Updated 12-6-2005

*****
Since you can only create one clustered index per table, take extra time to carefully consider how it will be used. Consider the type of queries that will be used against the table, and make an educated guess as to which query (the most common one run against the table, perhaps) is the most critical, and if this query will benefit from having a clustered index. [6.5, 7.0, 2000] Updated 12-6-2005

*****
Clustered indexes are useful for queries that meet these specifications:
• For queries that SELECT by a large range of values or where you need sorted results. This is because the data is already presorted in the index for you. Examples of this include when you are using BETWEEN, <, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in your queries.
• For queries that look up a record with a unique value (such as an employee number) and you need to retrieve most or all of the data in the record. This is because the query is covered by the index. In other words, the data you need is in the index itself, and SQL Server does not have to read any additional pages to retrieve the data you want.
• For queries that access columns with a limited number of distinct values, such as a columns that holds country or state data. But if column data has little distinctiveness, such as columns with a yes or no, or male or female, then won't want to "waste" your clustered index on them.
• For queries that use the JOIN or GROUP BY clauses.
• For queries where you want to return a lot of rows, just not a few. Again, this is because the data is in the index and does not have to be looked up elsewhere. [6.5, 7.0, 2000] Updated 12-6-2005

*****
If you run into a circumstance where you need to have a single wide index (a composite index of three or more columns) in a table, and the rest of the indexes in this table (assuming there are two or more) will only be one column wide, then consider making the wide index a clustered index and the other indexes non-clustered indexes.
Why? If the wide index is a clustered index, this means that the entire table is the index itself, and a large amount of additional disk space is not required to create the index. But if the wide index is a non-clustered index, this means SQL Server will have to create a "relatively large" index, which will indeed consume a large amount of additional disk space. Whenever the index needs to be used by the query processor, it will be more efficient to access the clustered index than the non-clustered index, and performance will be better. [6.5, 7.0, 2000] Updated 3-5-2004

*****
Avoid clustered indexes on columns that are already "covered" by non-clustered indexes. A clustered index on columns that is already "covered" is redundant. Use the clustered index for columns that can better make use of it. [6.5, 7.0, 2000] Updated 3-5-2004

*****
When selecting a column to base your clustered index on, try to avoid columns that are frequently updated. Every time that a column used for a clustered index is modified, all of the non-clustered indexes must also be updated, creating additional overhead. [6.5, 7.0, 2000] Updated 3-5-2004

*****
When selecting a column or columns to include in your clustered index, select the column (or for the first column in a composite index) that contains the data that will most often be searched in your queries. [6.5, 7.0, 2000] Updated 3-5-2004

*****
If a table has both a clustered index and non-clustered indexes, then performance will be best optimized if the clustered index is based on a single column that is as narrow as possible. This is because non-clustered indexes use the clustered index to locate data rows and because non-clustered indexes must hold the clustered keys within their B-tree structures. This helps to reduce not only the size of the clustered index, but all non-clustered indexes on the table as well. [6.5, 7.0, 2000] Updated 3-5-2004

*****
The primary key you select for your table should not always be a clustered index. If you create the primary key and don't specify otherwise, this is the default. Only make the primary key a clustered index if you will be regularly performing range queries on the primary key or need your results sorted by the primary key. [6.5, 7.0, 2000] Updated 3-5-2004

*****
If you drop or change a clustered index, keep in mind that all of the non-clustered indexes also have to be rebuilt. Also keep in mind that to recreate a new clustered index, you will need free disk space equivalent to 1.2 times the size of the table you are working with. This space is necessary to recreate the entire table while maintaining the old table until the new table is recreated. Then the old table is deleted. [6.5, 7.0, 2000] Updated 3-5-2004

*****
When deciding on whether to create a clustered or non-clustered index, it is often helpful to know what the estimated size of the clustered index will be. Sometimes, the size of a clustered index on a particular column or columns may be very large, leading to database size bloat. [7.0, 2000] Updated 3-5-2004

*****
Clustered index values often repeat many times in a table's non-clustered storage structures, and a large clustered index value can unnecessarily increase the physical size of a non-clustered index. This increases disk I/O and reduces performance when non-clustered indexes are accessed.
Because of this, ideally a clustered index should be based on a single column (not multiple columns) that is as narrow as possible. This not only reduces the clustered index's physical size, it also reduces the physical size of non-clustered indexes and boosts SQL Servers overall performance. [6.5, 7.0, 2000] Updated 10-15-2004

*****
When you create a clustered index, try to create it as a UNIQUE clustered index, not a non-unique clustered index. The reason for this is that while SQL Server will allow you to create a non-unique clustered index, under the surface, SQL Server will make it unique for you by adding a 4-byte "uniqueifer" to the index key to guarantee uniqueness. This only serves to increase the size of the key, which increases disk I/O, which reduces performance. If you specify that your clustered index is UNIQUE when it is created, you will prevent this unnecessary overhead. [7.0, 2000] Updated 10-15-2004

*****
If possible, avoid adding a clustered index to a GUID column (uniqueidentifier data type). GUIDs take up 16-bytes of storage, more than an Identify column, which in turn make the index larger, which increases I/O reads, which can hurt performance. While the performance hit will be minimal if you do decide to add a clustered index to a GUID column, every little bit ads up. [7.0, 2000] Updated 10-15-2004

-----
If anyone can add to this I think it would be absolutely great!
Thanks

Clustered Indexes (1 Viewer)

WindSailor

Registered User.

Similar threads

Users who are viewing this thread