Performance of Date-Only vs Date-Time Indexes

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 01:27
Joined
Jan 20, 2009
Messages
12,895
Has anyone tested the performance of querying from an indexed field holding date-time values compared to date-only values? (Academic perspectives also welcome.)

My current BE has two separate fields for date and time with an index on the date. I am restructuring the database and considering combining these fields. However almost all the queries rely on the date values and I am concerend that combining the date and time might impact performance by affecting the structure of the index.

My table has up to 25,000 records per day. The times are in hh:mm (without seconds) from 8:00 AM to 5:00 PM. So instead of the date index with one value in the index for an entire day there could potentially be up to 540 unique index values per day for the date-time field.

The new table will actually be in MS SQL Server with an Access front end using ODBC linked tables but I assume the question would also be relevant to JET/ACE.

Performance is important. The Access table has up to ten million records but I intend to let it go much further on the server (SQL Server 2005 Standard).
 
Here's my two cents worth:

If you have an index on both the date and the time field, your question is about one versus two indexes.
As you know a timestamp is a in fact a double. Time is a double. Date is not.
Again if you have two indexes, one is always faster when inserting and updating records.

There is something to say about having two fields. You never have to split the date from the timestamp to use it in queries which are used by your FE.

Enjoy!
 
The benefits of indexing is for quick data retrieval when searching for records and for sorting. So indexing is applied in the WHERE and ORDER BY clauses.

Updating, Inserting and Deleting records is in some ways hampered by mutliple indexes. When you Update a record all the indexes will need to updated (if necessary). Inserting will create an index and Deleting will cause all indexes to be deleted. So the Update and Delete actions (especially Update) is affected by multiple indexing.

In your case, and putting all the points you mentioned into consideration, you will benefit from indexing if you leave it as it currently is. The reason being if you were to join the date and time into one field and you are performing a search on the date part, you will apply a date function - such as DateValue() or Format() - to extract the date part and once this built-in function is applied, the search will no longer benefit from indexing on that field. So a full table scan is performed.

I hope it helps!
 
Interesting topic. SInce I don't know much I started googling.

I ran into this article, http://support.microsoft.com/kb/209126 , that contains the following nugget, of which I was not aware:

To decide what query strategy to use, the Optimizer uses statistics. The following factors are some of the factors that these statistics are based on:
  • The number of records in a table
  • The number of data pages in a table
  • The location of the table
  • Whether indexes are present
  • How unique the indexes are
Based on these statistics, the Optimizer then selects the best internal query strategy for dealing with a particular query.

The statistics are updated whenever a query is compiled. A query is flagged for compiling when you save any changes to the query (or its underlying tables) and when the database is compacted. If a query is flagged for compiling, the compiling and the updating of statistics occurs the next time that the query is run. Compiling typically takes from one second to four seconds.

If you add a significant number of records to your database, you must open and then save your queries to recompile the queries. For example, if you design and then test a query by using a small set of sample data, you must re-compile the query after additional records are added to the database. When you do this, you want to make sure that optimal query performance is achieved when your application is in use.
 
In your case, and putting all the points you mentioned into consideration, you will benefit from indexing if you leave it as it currently is. The reason being if you were to join the date and time into one field and you are performing a search on the date part, you will apply a date function - such as DateValue() or Format() - to extract the date part and once this built-in function is applied, the search will no longer benefit from indexing on that field. So a full table scan is performed.

This point needs a comment about querying practices for exactly the reason stated.

When querying date ranges many inexperienced developers would use the technique detailed. It is very simple to write but woefully slow because it can't use the index and must apply a function to every record.

For example (assuming records only cover 2011 and have no time component for simplicity):
Code:
WHERE Month(datefield) = 1

Instead the query that should be run:
Code:
WHERE datefield BETWEEN #1/1/2011# AND #01/31/2011#

Or with Time:
Code:
WHERE datefield BETWEEN #1/1/2011# AND #01/31/2011 23:59:59#

These clauses do not apply a function to every record and can use the index. So really the inclusion of the time is not an issue for querying per se.

However I realised overnight that I need to keep the date and time separate because I use the date field in the LinkMaster and LinkChild fields of some subforms and they would not work in the time was included.
 
What an excellent find Spike.

Very interesting and at last an excuse for using recordsource and update queries in code rather than saved queries. I have long used sql text in the RecordSource property and store update queries as text in the code.

I knew this meant there was a delay while the query was compiled anew every time it was run but it kept the navigation window clear. Now I can say it is to ensure the query is optimised for the current statistics. ;)
 
If you have an index on both the date and the time field, your question is about one versus two indexes.

I don't have the time field indexed because it isn't used for any important queries.

As you know a timestamp is a in fact a double. Time is a double. Date is not.

Interesting point. I have done very little in SQL server before and it hadn't occured to me that the date type would be integer based. One would have to expect its indexes to be significantly more efficient than the double used for date-time.

Of course in Access it is all the same since there is no separate date type. Moreover it looks like the Date type is only in Server 2008. I am using Server 2005. No wonder I hadn't considered its nature. ;)

I will go ahead with the separate fields for the server database.
 
Last edited:
@Galaxiom: Continuing in the article, among the recommendations to improve perfomance, we find this:

Save the SQL statement as a query

If the RecordSource property for a form or for report is set to an SQL statement, save the SQL statement as a query and then set the RecordSource property to the name of the query.

Now that makes me wonder: is it for the purpose of saving time on recompilation? Implying that SQL-statements in a RecordSource are compiled/optimized on each run or perhaps session, or is there some other explanation? I am asking because I simply do not know.
 
I believe that if it's not a query, it's not re-compiled and no statistics is saved. It's just run as is.

Because there's no query object for the statistics table to reference, there's no point for the Jet engine to compile and run stats on it.

If you know what route your SQL statement will take and you are experienced enough to know whether it's an optimised route, then in my opinion, I wouldn't save an SQL statement in a query. Queries are just for convenience and guidance.
 
@vbaInet - I am not sure I understand. Do you mean that a RecordSource SQL-statement does not get optimised at all? That would yield a severe impact on many queries, me thinks? But would explain why they recommend to save the queries. And also would contravene Galaxiom's tactics from #6
 
The performance of a query depends on the factors you mentioned above. So if an SQL statement satisfies the all the conditions for an optimised query, it will run pretty well.

All that a query does is re-compile a query so that it can:

1. Gather statistics based on the record scans
2. Create a "best-route" plan for executing the SQL statement.

Those two steps obviously carry a cost. An SQL statement run on its own will not perform those steps, hence, no costs incurred there.

My thoughts!
 
Okay. But this means that an SQL-statement in a RecordSource is run as is, and hence unoptimised (without a query plan). Which probably in many cases would entail a huge performance hit, right? But if so, that does indeed explain why they recommend saving the queries.
 
Well, a query plan is only for the developer to know where performance can be improved. Like I mentioned, if you're experienced enough you will know what route the engine will take to run the query.

The first thing you will do is to run the SQL statement within a query, get the execution plan and see where it can be improved upon. Make the changes and extract your optimised SQL statement from it. Delete the query. :)
 
But for us mere mortals that is way beyond our capabilities. Which implies that we (the mere mortals) should follow the MS recommendations, use saved queries instead of SQl-statements as RecordSource, and recompile those queries now and then if the amount of data changes.
 

Users who are viewing this thread

Back
Top Bottom