Following on from my earlier test results, I have just published a web article discussing the speed tests I used a few weeks ago to compare three methods of identifying unmatched records:
a) NOT IN subquery (as used by the OP)
Code:
SELECT tblData.Postcode FROM tblData WHERE tblData.Postcode NOT IN (SELECT tblSource.Postcode FROM tblSource);
b) NOT EXISTS subquery
Originally I saved this query exactly as
@ebs17 suggested:
Code:
SELECT tblData.Postcode FROM tblData WHERE NOT Exists (SELECT NULL FROM tblSource WHERE tblSource.Postcode = tblData.Postcode);
Access then proceeded to rewrite this in a way it deemed more efficient!
Code:
SELECT tblData.Postcode FROM tblData WHERE (Exists (SELECT NULL FROM tblSource WHERE tblSource.Postcode = tblData.Postcode))=False;
I tested both. The times were almost identical in each case!
c) Simple unmatched query using the wizard (my original suggestion)
Code:
SELECT tblData.Postcode FROM tblData LEFT JOIN tblSource ON tblData.[Postcode] = tblSource.[Postcode] WHERE (((tblSource.Postcode) Is Null));
I tested this for both non-indexed & indexed fields on two tables of 10K records, repeating each test 10 times to minimise the effects of natural variation and calculated averages etc
The non-indexed results were:
The two subquery methods were both very slow though the NOT IN query is far slower than NOT EXISTS.
By comparison, the simple unmatched query created using the query wizard is blisteringly fast!
I then indexed the search field in both tables and repeated the tests
The differences from the first set of tests are interesting:
a) The NOT IN query is almost as slow as before indicating that it cannot make use of the indexing
b) The NOT EXISTS query is now very fast indicating that it does use indexing
c) The unmatched query is still (just) the fastest though indexing appears to play little part in this set of results.
I then repeated the two faster tests for a larger indexed dataset of 1 million records in one of the tables
The greater efficiency of the simple unmatched query became even more apparent in this case with the unmatched query now taking just under 75% of the time taken by NOT EXISTS. I didn't attempt to time the NOT IN query for this large dataset for what, I hope, are obvious reasons.
Finally, I repeated those two tests using 10 million records. The average results in this case are shown below
In both cases, the total time is about 9x larger and the time difference is therefore even more significant
Conclusions
A NOT IN subquery is a very inefficient method of finding unmatched records as it cannot use indexes.
By comparison, a NOT EXISTS subquery is far better as it does use indexes
However, an unmatched query (such as that created by the wizard) is more efficient (faster) than either type of subquery
This becomes even more noticeable with larger datasets.
Although the NOT EXISTS subquery does work well for indexed fields, it is more complex to create and still significantly slower than a simple unmatched query for large datasets. I haven't changed my original opinion and
see no benefit in using subqueries for this purpose.
The article includes two versions of the test database - indexed & not-indexed so anyone interested can try this for themselves.
It can be found at
This article compares three methods of finding differences in records: the unmatched query created with the wizard compared to two methods using subqueries. It is the twelfth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding...
www.isladogs.co.uk
Hope some of you find it interesting