Discussion: Speed Comparison Tests 15

ebs17 · Jan 5, 2023

@isladogs shows a representation with variants for Comparison of Grouped Aggregate Queries. I've been looking into this, using the test environment, which I find clearer, according to Speed tests and speed comparisons for everyone. This measures a single run of each query.
I used an additional query that improves the previously optimal query. Because much of the discussion relates to indexes on SENTestHistory.Level and PupilData.Reading, there are test runs on all four ways of indexing the two fields. The results are as follows:

The following general statements can be made:

- The indexes on the named fields play no role in the actual grouping and minimally to no role in aggregation. COUNT(TH.Level) can be replaced by the more optimal COUNT(*).

- The non-equi join SENTestHistory.Level < PupilData.Reading:

Theta joins, i.e. comparisons like
A.id < B.id
A.id <= B.id
A.id = B.id * 2
lead to a cross join without using the index.

Source: Michael Zimmermann: Performance in Queries from 2005

So: These two indexes are not used in the query variants and are therefore superfluous here. The tendentially best results are when they are not set. However, it seems that if they exist despite not being used, they lead to processing and confusion. The two outliers in the subquery variant are particularly affected by confusion. Here I would simply say: Jet is crazy (is overwhelmed).

- I've often made statements about Pro subquery. In the variant used (correlated subquery with tablescan per execution) I would not have thought of using it because the performance here is devastating and the disaster grows exponentially with increasing dataset numbers. Here in the test it is only executed 1482 times (records from PupilData) and it's bad. Therefore, follow-up thoughts are also academic/superfluous.

Discussion of the most optimized query to date (qryCountCriteria):
SELECT PD.PupilID, PD.Surname, PD.Forename, PD.YearGroup, PD.Reading, PD.ReadingDate, Count(TH.Level) AS CountOfLevel
FROM PupilData AS PD INNER JOIN SENTestHistory AS TH ON PD.PupilID = TH.PupilID
WHERE TH.Type="Reading" And TH.Level<PD.Reading
GROUP BY PD.PupilID, PD.Surname, PD.Forename, PD.YearGroup, PD.Reading, PD.ReadingDate
HAVING Count(TH.Level)>1
ORDER BY PD.Surname, PD.Forename

Grouping is a comparative operation (done after filtering into FROM and WHERE parts) and can potentially use an index. There is no index for the entire field list used, and that doesn't really make sense either. Without index: Add up the required bytes in this list. These must be compared record by record. A single key, which is indexed anyway, does a much better job.

My suggestion:

SQL:

SELECT
   TH.PupilID,
   FIRST(PD.Surname) AS Surname,
   FIRST(PD.Forename) AS Forename,
   FIRST(PD.YearGroup) AS YearGroup,
   FIRST(PD.Reading) AS Reading,
   FIRST(PD.ReadingDate) AS ReadingDate,
   COUNT(*) AS CountOfLevel
FROM
   PupilData AS PD
      INNER JOIN SENTestHistory AS TH
      ON PD.PupilID = TH.PupilID
WHERE
   TH.Type = "Reading"
      AND
   TH.Level < PD.Reading
GROUP BY
   TH.PupilID
HAVING
   COUNT(*) > 1
ORDER BY
   FIRST(PD.Surname),
   FIRST(PD.Forename)

Grouping with TH.PupilID (not PD.PupilID), because the actual aggregation takes place in TH, so does grouping.
Sorting is the last in query execution, here over remaining 97 records. With that number, thoughts about index don't matter much anymore (calculation on table field breaks index usage).

The use of indexes on suitable fields is generally beneficial in speeding up searches and sorts.
However, it appears to offer no benefit in aggregate queries.

The second sentence is wrong, as the detailed QEP's already show: The indexes of Type and PupilID are used.

Finally, a content-related question, detached from syntax and performance topics:

NOTE: Reading age is expressed in years:months e.g. 13:07. Students take targeted tests aimed at a certain range e.g. 9:03 to 11:09

SENTestHistory.Level < PupilData.Reading
String: '9:03' > '11:09', '11:09' < '12:06'

Can this comparison bring correct results in this format?

isladogs · Jan 5, 2023

Hi
I wondered whether you would look into that article
Its purpose was purely to compare the 5 different types of aggregate query I listed - not necessarily to get the optimal solution for the task

Summarising your points above:
1. I'm well aware that grouping all those fields isn't the optimal solution
Indeed, I covered that exact point (First vs Group By) in an earlier article: https://www.isladogs.co.uk/speed-comparison-tests-8/

As mentioned in the article, there are already 2 follow up articles planned for some time later this month
The first of these (The Art of Indexing) includes discussion of the above point

2. Although not mentioned in the article (it was long enough already!), I compared Count(*) and Count(TH.Level). The times were almost identical in my tests

3. The follow up article also mentions the deliberate flaw used throughout about comparing the two text fields Level & Reading
The SEN department at my school originally created the query with that inequality and wondered why it didn't always give correct output.

Whilst it doesn't affect the comparative times, it does give inaccurate totals for some students
I'm intending to cover that point in the follow up article as well. It was easily solved

4. In the meantime, I added an additional test to the article this morning before I saw your post.
In the latest test, the indexes on Level & Reading are retained but I forced Access to ignore them in each query.
The article explains how.
When I did this, the times were faster still for each of the queries.

Despite it appearing that the index isn't used in aggregation, it does appear to be beneficial to leave it in place

Discussion: Speed Comparison Tests 15

ebs17

Well-known member

isladogs

MVP / VIP

Similar threads

Users who are viewing this thread