Solved Query runs slow with NOT IN

Brother Arnold · Nov 28, 2022

I'm running a simple query that provides a list of client IDs that are in the client table but are not linked to the monitoring table. It's running a little slow and if I click to go to the last record for example it just hangs. Client has 18k records, monitoring 12k so result is 6k. The IDs are indexed on both tables

Here's the code

Code:

SELECT tblClient.ID
FROM tblClient
WHERE tblClient.ID NOT IN (SELECT tblMonitoring.client_ID FROM tblMonitoring);

Have I got the code right? Are there any tricks to speed it up?

pbaldy · Nov 28, 2022

A join would likely be faster. Play with the unmatched query wizard.

isladogs · Nov 28, 2022

Agree with @pbaldy
Subqueries are always slower and should only be used where no better method exists.
In this case an unmatched query will be FAR faster and a wizard is available to assist you

CORRECTION : slower

Brother Arnold · Nov 28, 2022

pbaldy said:
A join would likely be faster. Play with the unmatched query wizard.

Thanks, I completely forgot that existed. I don't work in Access enough to keep everything fresh in my mind

pbaldy · Nov 28, 2022

Happy to help.

ebs17 · Nov 29, 2022

... WHERE tblClient.ID NOT IN (SELECT ...
The NOT IN prevents index use. Therefore the query must be slow.

The following variant will run about as fast as the JOIN variant, despite using a subquery.

SQL:

SELECT
   tblClient.ID
FROM
   tblClient
WHERE
   NOT EXISTS
      (
         SELECT
            NULL
         FROM
            tblMonitoring
         WHERE
            tblMonitoring.client_ID = tblClient.ID
      )

Not using subqueries can be a personal stylistic device. A blanket disqualification is not convincing, but also reveals a lack of knowledge.

isladogs · Nov 29, 2022

@ebs17
We've had this discussion before and you won't be surprised that I disagree with part of your comments

Allen Browne wrote this in one of his articles on subqueries: Microsoft Access tips: Surviving subqueries (allenbrowne.com)

Subqueries are amazingly flexible and incredibly frustrating. They have an important place in your repertoire, but you must have other tricks up your sleeve as well.
In general, subqueries are slower and less stable than JOINs. You must test them with real-world data samples to know how they will perform.
They are certainly worth pursuing and mastering for the flexibility they provide.

I agree with all of that
Subqueries are an extremely useful tool and I use them wherever appropriate
I have also extensively tested their speed in comparison with other approaches to achieve the same result.

I agree that NOT IN will be slow and NOT EXISTS will be MUCH faster. However, an unmatched query may well be faster still

I wasn't impressed by your comment

Not using subqueries can be a personal stylistic device. A blanket disqualification is not convincing, but also reveals a lack of knowledge.

You are entitled to your opinions but there is no need to criticize others

ebs17 · Nov 29, 2022

I don't criticize people, I criticize statements like "Subqueries are always slow". ("I agree that ... will be MUCH faster.")
I criticize blanket statements along the lines of "put the contents of a subquery into its own stored query and all is well and fast".

I myself take a more differentiated approach, so I get to know strengths and weaknesses and can make well-founded choices.
In the example above, the question was not so much whether JOIN or subquery, but rather the use or non-use of indices for the amounts of data mentioned, and that is a crucial question in addition to the query design. The advantage of the JOIN variant is that the developer is less likely to implement a non-use of the JOIN himself. I could do that very easily if I wanted to.
It is also true that the JOIN variant interferes with or even prevents the updateability of the query. So there are more criteria to choose from than just a maybe slightly higher speed.

I don't know unstable queries due to the use of subqueries, at least not from my pen. Performance can be measured. Above all, it must be sufficient for the required task.

Galaxiom · Nov 30, 2022

ebs17 said:
... WHERE tblClient.ID NOT IN (SELECT ...
The NOT IN prevents index use. Therefore the query must be slow.

The following variant will run about as fast as the JOIN variant, despite using a subquery.

SQL:

SELECT tblClient.ID FROM tblClient WHERE NOT EXISTS ( SELECT NULL FROM tblMonitoring WHERE tblMonitoring.client_ID = tblClient.ID )

Not using subqueries can be a personal stylistic device. A blanket disqualification is not convincing, but also reveals a lack of knowledge.

That query is also slow because it uses a correlated subquery which runs for every record in tblClient.

I've never had speed problems with uncorrelated subqueries.

ebs17 · Nov 30, 2022

Now I've given @Galaxiom a like because he considers the subquery to be a bit nuanced and not something to avoid altogether if there's any alternative. However, the query with the NOT IN construct is not a correlated subquery, it is only executed once. The NOT EXISTS variant I have shown is such a correlated query, but EXISTS has some special technical features.

In a forum that is no longer active, there was practically the same task as mentioned above, except that it was a delete query and not a selection query. Due to the change from NOT IN to NOT EXISTS, the query took almost half a second instead of several minutes (according to that user's measurement). The bottom line was an acceleration by a factor of about 2500. You could really hear and feel the AHA of this user.

isladogs · Nov 30, 2022

I like to base my conclusions on data.
I ran some tests on 2 almost identical tables of 10000 UK postcodes tblSource & tblData where one randomly selected record in tblData was changed

I measured the time taken to look for the unmatched record using 3 queries as discussed above:

qryNotIN

Code:

SELECT tblData.Postcode
FROM tblData
WHERE tblData.Postcode NOT IN (SELECT tblSource.Postcode FROM tblSource);

qryNotEXISTS

Code:

SELECT tblData.Postcode FROM tblData
WHERE (((Exists (SELECT NULL FROM tblSource WHERE tblSource.Postcode = tblData.Postcode))=False));

qryUnmatched

Code:

SELECT tblData.Postcode
FROM tblData LEFT JOIN tblSource ON tblData.[Postcode] = tblSource.[Postcode]
WHERE (((tblSource.Postcode) Is Null));

I deliberately made the first two queries the same as in earlier posts and used the wizard for the third query

Unfortunately, the JET ShowPlan feature cannot handle subqueries so we cannot know the precise execution plan used by Access for these
So, for the first set of tests, the Postcodes field was deliberately NOT indexed

These were the results

Code:

Check for unmatched record - NON-INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 85.1875 s
2. Query qryNotExists:  Time Taken = 38.875 s
3. Query qryUnmatched :  Time Taken = 0.171875 s


Check for unmatched record - NON-INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 84.15234 s
2. Query qryNotExists:  Time Taken = 38.51563 s
3. Query qryUnmatched :  Time Taken = 0.111437 s


Check for unmatched record - NON-INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 83.67188 s
2. Query qryNotExists:  Time Taken = 38.79688 s
3. Query qryUnmatched :  Time Taken = 0.125 s

Both subqueries were very slow but the unmatched query was very fast

However, of course, fields used in searches should be indexed so Access can search the indexes rather than scan all records
So I then indexed the Postcodes field in each table and repeated the tests.

Code:

Check for unmatched record - INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 84.67188 s
2. Query qryNotExists:  Time Taken = 0.1523438 s
3. Query qryUnmatched :  Time Taken = 0.1171875 s


Check for unmatched record - INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 84.54297 s
2. Query qryNotExists:  Time Taken = 0.1367188 s
3. Query qryUnmatched :  Time Taken = 0.109375 s


Check for unmatched record - INDEXED FIELD:
1. Query qryNotIN:  Time Taken = 84.25391 s
2. Query qryNotExists:  Time Taken = 0.140625 s
3. Query qryUnmatched :  Time Taken = 0.109375 s

Conclusions:
1. Query NotIN cannot use the index (as stated above by @ebs17) so it runs just as slowly - approx 84s
2. Query qryNotEXISTS does use the index and runs almost 300x faster -approx 0.14s
3. Query qryUnmatched takes a similar time as before but is still the fastest of the 3 methods in these tests - approx 0.11s

Finally I repeated the last two tests (after restoring the index to the Postcode field) but this time using 1 million records in the two tables
For obvious reasons, I didn't bother with qryNotIN in this case as it would probably have crashed Access

The results were:

Code:

Check for unmatched record - INDEXED FIELD:
1. Query qryNotExists:  Time Taken = 3.75 s
2. Query qryUnmatched :  Time Taken = 2.65625 s


Check for unmatched record - INDEXED FIELD:
1. Query qryNotExists:  Time Taken = 3.90625 s
2. Query qryUnmatched :  Time Taken = 2.75 s


Check for unmatched record - INDEXED FIELD:
1. Query qryNotExists:  Time Taken = 4.109375 s
2. Query qryUnmatched :  Time Taken = 2.6875 s

Although both queries were fast, the unmatched query took about 2.7s compared to 3.9s for NotEXISTS

ebs17 · Nov 30, 2022

Thank you for testing and posting the results.

From the results one would conclude that with qryUnmatched it doesn't matter whether there are indices or not. I find that astonishing to the point of impossible. After all, setting referential integrity on a relationship automatically indexes the foreign key because that's so important.

@Pat Hartman:

We DO KNOW that Jet/ACE do NOT optimize subqueries efficiently.

That's a killer argument. But not for me.

1) What exactly should an optimization achieve with such a clear task?
2) The first optimizer is the developer who formulates the query. A built-in NOT or <> as an index killer cannot be ignored by a SQL optimizer and can rarely be optimized away. Sometimes it should and will be enough to simply execute the query as it is.

isladogs · Nov 30, 2022

ebs17 said:
Thank you for testing and posting the results.

From the results one would conclude that with qryUnmatched it doesn't matter whether there are indices or not. I find that astonishing to the point of impossible. After all, setting referential integrity on a relationship automatically indexes the foreign key because that's so important.

To be honest I was also surprised how fast the unmatched query ran on the smaller recordset of 10K records when the field wasn't indexed

Just retested with 1 million records in tblData and 10000 in tblSource

Code:

Check for unmatched record - NON-INDEXED FIELD:
2. Query qryUnmatched :  Time Taken = 4.851563 s

Check for unmatched record - INDEXED FIELD:
2. Query qryUnmatched :  Time Taken = 2.652344 s

So indexing does make a difference even to the unmatched query
As that doesn't use a subquery, I ran the query execution plans for that query in both cases

First of all - not indexed

Code:

--- qryUnmatched ---Postcode field NOT INDEXED---

- Inputs to Query -
Table 'tblData'
Table 'tblSource'
- End inputs to Query -

01) Sort table 'tblData'
02) Sort table 'tblSource'
03) Outer Join result of '01)' to result of '02)'
      merging indexes
      join expression "tblData.[Postcode]=tblSource.[Postcode]"
04) Restrict rows of result of 03)
      by scanning
      testing expression "tblSource.Postcode Is Null"

I'm not clear what merging indexes means above as the join field has no indexes in either table ...hence scanning the table records

And again with the field indexed

Code:

--- qryUnmatched ---Postcode field INDEXED---

- Inputs to Query -
Table 'tblData'
Table 'tblSource'
    Using index 'Postcode'
    Having Indexes:
    Postcode 10000 entries, 37 pages, 10000 values
      which has 1 column, fixed
    ID 10000 entries, 18 pages, 10000 values
      which has 1 column, fixed, clustered and/or counter
- End inputs to Query -

01) Outer Join table 'tblData' to table 'tblSource'
      using index 'tblSource!Postcode'
      join expression "tblData.[Postcode]=tblSource.[Postcode]"
02) Restrict rows of result of 01)
      by scanning
      testing expression "tblSource.Postcode Is Null"

Perhaps you would like to try a similar test for yourself and let us know the outcomes

isladogs · Dec 16, 2022

Following on from my earlier test results, I have just published a web article discussing the speed tests I used a few weeks ago to compare three methods of identifying unmatched records:
a) NOT IN subquery (as used by the OP)

Code:

SELECT tblData.Postcode FROM tblData WHERE tblData.Postcode NOT IN (SELECT tblSource.Postcode FROM tblSource);

b) NOT EXISTS subquery
Originally I saved this query exactly as @ebs17 suggested:

Code:

SELECT tblData.Postcode FROM tblData WHERE NOT Exists (SELECT NULL FROM tblSource WHERE tblSource.Postcode = tblData.Postcode);

Access then proceeded to rewrite this in a way it deemed more efficient!

Code:

SELECT tblData.Postcode FROM tblData WHERE (Exists (SELECT NULL FROM tblSource WHERE tblSource.Postcode = tblData.Postcode))=False;

I tested both. The times were almost identical in each case!

c) Simple unmatched query using the wizard (my original suggestion)

Code:

SELECT tblData.Postcode FROM tblData LEFT JOIN tblSource ON tblData.[Postcode] = tblSource.[Postcode] WHERE (((tblSource.Postcode) Is Null));

I tested this for both non-indexed & indexed fields on two tables of 10K records, repeating each test 10 times to minimise the effects of natural variation and calculated averages etc

The non-indexed results were:

The two subquery methods were both very slow though the NOT IN query is far slower than NOT EXISTS.
By comparison, the simple unmatched query created using the query wizard is blisteringly fast!

I then indexed the search field in both tables and repeated the tests

The differences from the first set of tests are interesting:
a) The NOT IN query is almost as slow as before indicating that it cannot make use of the indexing
b) The NOT EXISTS query is now very fast indicating that it does use indexing
c) The unmatched query is still (just) the fastest though indexing appears to play little part in this set of results.

I then repeated the two faster tests for a larger indexed dataset of 1 million records in one of the tables

The greater efficiency of the simple unmatched query became even more apparent in this case with the unmatched query now taking just under 75% of the time taken by NOT EXISTS. I didn't attempt to time the NOT IN query for this large dataset for what, I hope, are obvious reasons.

Finally, I repeated those two tests using 10 million records. The average results in this case are shown below

In both cases, the total time is about 9x larger and the time difference is therefore even more significant

Conclusions
A NOT IN subquery is a very inefficient method of finding unmatched records as it cannot use indexes.
By comparison, a NOT EXISTS subquery is far better as it does use indexes

However, an unmatched query (such as that created by the wizard) is more efficient (faster) than either type of subquery
This becomes even more noticeable with larger datasets.

Although the NOT EXISTS subquery does work well for indexed fields, it is more complex to create and still significantly slower than a simple unmatched query for large datasets. I haven't changed my original opinion and see no benefit in using subqueries for this purpose.

The article includes two versions of the test database - indexed & not-indexed so anyone interested can try this for themselves.
It can be found at

Speed Comparison Tests 12

This article compares three methods of finding differences in records: the unmatched query created with the wizard compared to two methods using subqueries. It is the twelfth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding...

www.isladogs.co.uk

Hope some of you find it interesting

isladogs · Dec 16, 2022

It may well be different for SQL Server.
Certainly not now, but I'm intending to do a number of tests using SS BE tables next year.

ebs17 · Dec 28, 2022

A little reflection from my point of view:

Query runs slowly with NOT IN

My first choice of the alternative is the unmatched query with the LEFT JOIN, by the way, before the entire discussion and before the measurements provided, because practical experience has taught me that. This would be amplified if one had to compare over more than one key. After all, everyone wants to use the simplest, best, and fastest... to leave room for bigger, right things to do.

However, the formulated task is also extremely simple: 2 tables, comparison via a key.
How can / should you formulate a query when the conditions are more extensive and complicated: More tables involved, additional criteria, additional requirements.
If the only additional requirement is that the query should be updatable, the LEFT JOIN construct will fail here.
What to do? Now that there is no other way (let's rule out the creation of additional temporary tables, because it's not optimal either), do we reach into the black box with the unloved and unoptimizable subqueries and pull out the closest available one?
You would only be familiar with the next best instead of the closest available if you didn't demonize and equate subqueries globally, but if you also dealt with them and as a result were able to fall back on experience and habits, i.e. if you had an idea of the effort, index use, correlation and similar things.

Subqueries are always slow and should only be used where no better method exists.

The discussion and the measurements have shown that even in the original constellation, a subquery does not necessarily have to be slow. You can deal with it. If you claim that you always use the fastest, then you have to be measured against it in all your actions. I think lived practice will find some contradictions, at each.

isladogs · Dec 28, 2022

ebs17 said:
You would only be familiar with the next best instead of the closest available if you didn't demonize and equate subqueries globally, but if you also dealt with them and as a result were able to fall back on experience and habits, i.e. if you had an idea of the effort, index use, correlation and similar things.

Despite your repeated attempts to wrongly typecast me as someone who neither uses nor understands how to use subqueries, let me state again that is far from the truth.

I have been using Access to a high level for over 25 years and I regularly use subqueries. I do so whenever it is beneficial.

For example, see my comments & quote by Allen Browne in post #7

Allen Browne wrote this in one of his articles on subqueries: Microsoft Access tips: Surviving subqueries (allenbrowne.com)
Subqueries are amazingly flexible and incredibly frustrating. They have an important place in your repertoire, but you must have other tricks up your sleeve as well.
In general, subqueries are slower and less stable than JOINs. You must test them with real-world data samples to know how they will perform.
They are certainly worth pursuing and mastering for the flexibility they provide.

I agree with all of that
Subqueries are an extremely useful tool and I use them wherever appropriate

My conclusions in post #16 were justified by data

Although the NOT EXISTS subquery does work well for indexed fields, it is more complex to create and still significantly slower than a simple unmatched query for large datasets. I haven't changed my original opinion and see no benefit in using subqueries for this purpose.

ebs17 said:
My first choice of the alternative is the unmatched query with the LEFT JOIN, by the way, before the entire discussion and before the measurements provided, because practical experience has taught me that. This would be amplified if one had to compare over more than one key. After all, everyone wants to use the simplest, best, and fastest... to leave room for bigger, right things to do.

Good. We agree. Yet you consistently argued the point earlier?

ebs17 said:
However, the formulated task is also extremely simple: 2 tables, comparison via a key.
How can / should you formulate a query when the conditions are more extensive and complicated: More tables involved, additional criteria, additional requirements.
If the only additional requirement is that the query should be updatable, the LEFT JOIN construct will fail here.

Why mention that? If I’m looking for records in one table but not another, updating the data isn’t normally relevant

ebs17 said:
The discussion and the measurements have shown that even in the original constellation, a subquery does not necessarily have to be slow. You can deal with it. If you claim that you always use the fastest, then you have to be measured against it in all your actions. I think lived practice will find some contradictions, at each.

Speed is always an important factor but so is the complexity of code used

The comment I made that does need editing is the line in post #3 where I wrongly omitted 2 letters. I have now amended it to read:

Subqueries are always slower and should only be used where no better method exists.

Perhaps you won't agree with that comment either.
If so, I challenge you to devise one or more subqueries that do EXACTLY the same task as a single well optimised query and run faster – better still significantly faster. If you can do so, please upload for others to benefit from your wisdom

I look forward to your response

ebs17 · Dec 30, 2022

First of all: I appreciate your statements and your work shown, such as the speed tests. I don't want to offend anyone personally. I don't typecast anyone. At best, only visible statements and visible actions can flow into my personal and insignificant evaluation. Repeated core sentences are undoubtedly very visible, I don't know the life work of people at all in case of doubt.
My core statement on several topics was and is: General statements about designs, here called subqueries, which can be very different in formulation and place of occurrence, are technically very debatable.

A blanket disqualification is not convincing

#12: 84.67188s / 0.1523438s / 0.1171875s
The slowness of the 0.15 seconds would be associated more with the speed of the 0.11 seconds than with the slowness of the 84 seconds. If @Brother Arnold (like many others observed) had not thought of NOT IN as the first (and only?) possible solution, but NOT EXISTS, the performance would most likely have been satisfactory and would not have given reason to ask in the forum. The query could also be interpreted as "what's wrong", and the bigger mistake is the lack of index usage, not the use of a subquery per se. So I made this addition to the topic.

Of course, if I realize that there can be more than one solution to a query and if I can also formulate such variants, then it makes sense to compare and choose the best one. But I have to skip this double "if" for now.
The fastest option is a good candidate for selection. Adequate speed might be sufficient though. Personally, I have other aspects that are relevant to my assessment.
Filter: The task set at the beginning is filtering. Filtering requirements can be more extensive and complex, for example as a combination of several conditions in either OR or AND logic. A JOIN has filtering properties. Implementing multiple and variable conditions using JOINs should be more strenuous compared to being able to limit yourself to the WHERE part of a query. Can one imagine a FormFilter that is preferably implemented via JOINs?
More extensive tasks: In (my) practice, you also have to deal with data models, where you don't just have to consider one or two tables, but sometimes 5, 6, ... 8 tables, of which there may also be several instances of the same table. Additional JOIN operations can become an adventure. Subqueries help me to read and understand a complete query process from table data to the end result, because I have the complete processing in one view and in a context and can quickly recognize, for example, superfluous intermediate sorting and superfluous table fields that are carried along. Understanding also helps in developing and later maintaining the instruction.
Ensuring that a query can be updated is by no means a rare wish or requirement.

I often have to deal with non-simple queries. The longest statement in Jet was probably more than 1200 characters long (using short table aliases). A lot happens, but it doesn't have to result in a performance or stability problem.

I challenge you to devise one or more subqueries that do EXACTLY the same task as a single well optimised query and run faster

I think you are very happy that I use your own example from Speed Comparison Tests - Optimise Queries, which you know very well.
In my own cursory testing, I find the following statement to be competitive:

SQL:

SELECT
   Sub1.PupilID,
   Sub1.Surname AS LastName,
   Sub1.Forename AS FirstName,
   Sub1.YearGroup AS YearGp,
   Sub2.Code,
   Sub2.CodeType,
   Sub2.Merits,
   Sub2.Demerits,
   Sub2.Incidents
FROM
   (
      SELECT
         P.PupilID,
         P.Surname,
         P.Forename,
         P.YearGroup
      FROM
         PupilData AS P
      WHERE
         P.DateOfBirth BETWEEN #1/1/2005# AND #12/31/2005#
   ) AS Sub1
      INNER JOIN
         (
            SELECT
               R.PupilID,
               R.Code,
               FIRST(C.Description) AS CodeType,
               FIRST(R.MeritPts) AS Merits,
               FIRST(R.DeMeritPts) AS Demerits,
               COUNT(*) AS Incidents
            FROM
               PRCodes AS C
                  INNER JOIN PRecords AS R
                  ON C.Code = R.Code
            WHERE
               R.DateOfIncident BETWEEN #1/1/2018# AND #12/30/2018#
            GROUP BY
               R.PupilID,
               R.Code
         ) AS Sub2
         ON Sub1.PupilID = Sub2.PupilID
ORDER BY
   Sub1.Surname,
   Sub1.Forename

The practical case that the PupilData table not only contains the birth cohorts 2004 and 2005, but all of them, should be interesting, so the selection will be much more selective.

to benefit from your wisdom

That sounds a bit provocative. By special request, I am adding a few comments that I have omitted so far because they could easily be perceived as criticism rather than advice.

I not only looked at the results, but also at the test environment.

Code:

    'start timer
    StartTime = TimeGetTime() 'time in milliseconds
    For iCount = 1 To 3
        DoCmd.OpenQuery strQuery
        lngCount = DCount("*", strQuery)
        DoCmd.Close acQuery, strQuery
    Next
    'stop timer
     EndTime = TimeGetTime() 'time in milliseconds

The total times of 6 query executions - (OpenQuery + DCount) * 3 - are output as results to the surface. The viewer should be made aware of this or be solved differently.
In practice, it is unlikely that an identical query will be executed several times immediately. "Significant" differences in the absolute amount for each individual query are then presented somewhat differently.

In my own test, I would run different queries individually and alternately to reduce the influence of cache, and then repeat the entire run several times. Single query:

Code:

Set rs = db.Openrecordset("QueryName", dbOpenSnapshot)
rs.MoveLast

The overhead from the recordset object is not very large. For exact measurements you would have to determine it (I don't know how) and subtract it.

With a flagship database, I would also pay attention to the little things. The reader will surely want to copy correctly.
Index planning
PRCodes
Duplicate index on Code

PupilData
A composite index using Surname and Forename makes sense because of the sorting in the query. Index on Forename is not used in the given constellation.

PRecords
Duplicate index on PastoralRecordID. Single indexes on Code and PupilID duplicate because RI (foreign keys are automatically indexed when referential integrity is set to a relationship, but this is not easily visible). Composite index using PupilID and Code makes sense because of grouping in query.

Duplicate indexes bring no benefit. However, they still have to be managed and therefore have a braking effect on writing actions.
The composite indexes mentioned can have a direct effect on performance if the SQL optimizer cooperates. In a description of performance, one should not neglect this, as well as index use in general.

SQL:

GROUP BY
   PupilData.PupilID,
   PupilData.Surname,
   PupilData.Forename,
   PupilData.YearGroup,
   PRecords.Code,
   PRCodes.Description,
   PRecords.MeritPts,
   PRecords.DeMeritPts,
   Year([DateOfBirth]),
   Year([DateOfIncident]),
   [Surname] & ", " & [Forename]

Such a grouping across a huge list of fields occurs in 9 out of 11 queries. In my opinion, this is a catastrophe in itself and should not happen without great need. Certainly it should be discussed separately. Grouping is a comparative operation and thus grateful for index usage. Index usage for such a monster of field listing is unthinkable, not for calculated fields anyway. How do you get to that, aside from slipping your finger on the function key in the QBE?

A JOIN has a filtering function, but also a multiplying function. When several tables are joined, the contents of individual tables in the query are duplicated, and this is usually the first step in processing the query. In the query result, however, you no longer want to see duplicates, there is a "necessity" to group them (as far as possible).
The bottom line is that one can ask oneself how efficient it will be to first multiply content and thus also increase the number of data records in order to then reduce them again ... if you could leave content unique throughout.

I also look forward to feedback.

ebs17 · Dec 30, 2022

An example:
The New York Marathon has existed since 1970, last time around 50,000 runners took part, some more than once, of course. The whole thing is documented in a simple table TableX with the fields ID, PersonID, DateOfEvent and RunningTime. There is a functional indexing of the fields.

A statistics friend would now like to have the three best times displayed for each runner, if there are so many. My suggestion for this is ...

SQL:

SELECT
   T.ID,
   T.PersonID,
   T.DateOfEvent,
   T.RunningTime
FROM
   TableX AS T
WHERE
   T.ID IN
      (
         SELECT TOP 3
            X.ID
         FROM
            TableX AS X
         WHERE
            X.PersonID = T.PersonID
         ORDER BY
            X.RunningTime
      )

Since I can think of three other approaches for variants, it should be possible to formulate a single well optimised query. I leave the hard work for this as well as the hard work for filling the table with the content to the interested parties involved.

isladogs · Dec 31, 2022

Sorry for the slow response to the last two posts but I've been doing various lengthy tasks connected to my website

Re: Post #21
I agree that the OP would probably have been perfectly happy with either the unmatched query or the NOT EXISTS variant.
However, it appears that he lost interest in this thread long ago (as probably has everyone else!)

When I’m running speed tests, I try to ensure that the dataset is small enough to be downloaded so others can test & further modify as they see fit. That’s one reason why the tests only rarely involve SQL Server linked tables. I have also run a SQL version of these tests which I will at some point also publish

The size constraint also means that many of the tests are fast to run, often taking much less than a second even on my old & slow PC. This raises an issue with timing.

Many of my more complex queries will have multiple tables, self joins etc and include several million records. But these aren’t often that useful for speed test comparisons published online.

The reason I often loop through tests multiple times is to do with the precision of the timing process.
Basically, although multiple timing methods exist, most are limited by the system clock which is typically about 60 ticks per second. This makes for a precision of about 1/60 second = 0.016 seconds approx.

Yes of course, we often want queries to be updateable. My point was that isn’t something I’d ever need in an unmatched query.

Yes I’m perfectly happy for you to use my Optimise Queries example.
Like all of my apps, it isn’t perfect and I had already addressed many (though not all) of the criticisms you raised.
The example app included the original linked SQL tables which were adapted as local tables for the task. Indexing issues occurred as I forgot to update all the indexes after conversion.

There was meant to be a composite index on Surname/Forename & this was even mentioned in passing in the article. But you’re right – it wasn’t there & I’ve now fixed that.
I’ve also removed the duplicate indices.
However, surprisingly all those index changes only slightly reduced the test times

Counting the records each time the test ran was indeed a mistake that I spotted several months ago.
That section of code was changed to

Code:

 'run test
   StartTime = TimeGetTime() 'time in milliseconds
   
    Application.Echo False
    For iCount = 1 To LC
        DoCmd.OpenQuery strQuery
        DoCmd.Close acQuery, strQuery
    Next
     
    'stop timer
    EndTime = TimeGetTime() 'time in milliseconds
   
    'get record count
    lngCount = DCount("*", strQuery)
   
    Application.Echo True

This significantly reduced the times for the first few (deliberately badly designed) queries but had minimal effect on the later better optimised versions. It should be obvious why that is the case.

I’ve taken the liberty of adding your subquery example to the app (as test L - Subquery 1) along with my own much simpler version which allowed me to remove a join (test M - Subquery 2)

Code:

SELECT PupilData.PupilID, First(PupilData.Surname) AS LastName, First(PupilData.Forename) AS FirstName, First(PupilData.YearGroup) AS YearGp, PRecords.Code,
(SELECT Description FROM PRCodes WHERE Code=PRecords.Code) AS CodeType,
First(PRecords.MeritPts) AS Merits, First(PRecords.DeMeritPts) AS Demerits, Count(PRecords.PastoralRecordID) AS Incidents
FROM PupilData INNER JOIN PRecords ON PupilData.PupilID = PRecords.PupilID
WHERE (((PupilData.DateOfBirth) Between #1/1/2005# And #12/31/2005#) AND ((PRecords.DateOfIncident) Between #1/1/2018# And #12/31/2018#))
GROUP BY PupilData.PupilID, PRecords.Code
ORDER BY First(PupilData.Surname), First(PupilData.Forename);

Here are the updated times with the code & index changes also included

Your subquery is almost 6% slower than the most optimised query (J) whereas mine is technically faster … by a huge 0.001 s

I'm intending to update both the web article and the example app in the next few days.

I wasn’t trying to be provocative in my comments.
You clearly think a lot about achieving efficient code & how your queries are organised.
However, to the best of my knowledge you’ve not posted any examples of your work.
Hopefully you will do so in the future.

As for the grouping, you do seem to have missed the point of this exercise.
It was designed to replicate many of the points raised in Allen Browne’s original article and to demonstrate the effect of each by changing ONE thing at a time. Of course, I could have omitted several of the early queries such as the outer joins & gone straight to the approach used in test J.
However, the whole exercise was designed to be a teaching aid and for use in presentations to other Access developers.

Response to post #22 coming shortly

Solved Query runs slow with NOT IN

Member

Wino Moderator

Access MVP / VIP

Member

Wino Moderator

Well-known member

Access MVP / VIP

Well-known member

Super Moderator

Well-known member

Access MVP / VIP

Well-known member

Access MVP / VIP

Access MVP / VIP

Access MVP / VIP

Well-known member

Access MVP / VIP

Well-known member

Well-known member

Access MVP / VIP

Similar threads

Users who are viewing this thread