Mythbusters - SELECT DISTINCT vs GROUP BY

isladogs · Feb 17, 2022

Hi all
I updated my Mythbusters: WHERE vs HAVING thread yesterday

Mythbusters - Speed Comparison Tests - Having vs Where

This is a follow up to a recent thread Desperate-Count Help Needed https://www.access-programmers.co.uk/forums/showpost.php?p=1597549&postcount=8 Part of this included discussion of the comparative efficiency of aggregate queries using WHERE with HAVING. I thought that Galaxiom explained this...

www.access-programmers.co.uk

When I did so, @CJ_London reminded me of a previous request to test the following:

We often see OP's using aggregate queries without any aggregation rather than SELECT DISTINCT - do you have an example to compare a SELECT DISTINCT v GROUP BY with perhaps a criteria or two?

I must confess I'd never thought about comparing these before but did so this morning
I asked the following question both in the thread & in a PM to @CJ_London :

what are your predictions before I make the results public?

Before I started, I had a gut feeling that SELECT DISTINCT would be slightly faster though I couldn't have given any meaningful reason for that
All those who responded tended to agree.... BUT the opposite was true ...

I did various tests on 2 different datasets and on each test (without exception), GROUP BY was slightly faster.
The differences were relatively small but they were consistent in every test

First of all, I tested my Postcodes dataset (2.6 million records):
In my first set of tests, I ran the following queries:

qryDistinct

Code:

SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes;

qryGroupBy

Code:

SELECT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector;

Each gave exactly the same output of 12415 records.

All fields in the queries were indexed
I looped through each query 3 times and measured the total time in each case.
I repeated each test & calculated the averages:

I then filtered the data when running each query to see if that would change the results. Same outcome...

I then repeated the tests, but this time appended the query output into a temp table each time to ensure that the query had been loaded fully during the tests.
Same outcome ....

I ran JET ShowPlan on each of the queries. The results were almost identical:

Next I tried using a different dataset supplied by @CJ_London.
It was a smaller dataset of approx 150K records so I looped through each query 20 times.
This time, there were 2 non-indexed fields in the queries
Once again the two queries gave identical outputs = 8324 records

Did that affect the outcomes? NO

Next I ran Jet ShowPlan on each query. The execution plans were almost identical:

NOTE:
The 'by scanning' reference' indicates that Access had to scan the entire dataset in each case due to the inclusion of non-indexed fields

I can't upload my Postcodes dataset as the ACCDB file is about 1.6 GB
However, I will upload the example app with @CJ_London's dataset if he confirms its OK to do so

==========================================================================

For more info on Jet ShowPlan, see my article:

Show Plan - Go Faster

Use Jet ShowPlan to view the execution plan of Access queries. This can help optimise query execution

www.isladogs.co.uk

Also see this excellent article by Mike Wolfe:

JetShowPlan: A Primer

You may be familiar with JetShowPlan, but I guarantee you've never read an article about it quite like this one.

nolongerset.com

Isaac · Feb 17, 2022

isladogs said:
All those who responded tended to agree.... BUT the opposite was true ...

Except one

But I can't claim much credit for it, as I was only going by sql server precedent, which isn't really precedent at all, just a guess.

isladogs · Feb 17, 2022

I couldn't work out exactly what your position was ....

CJ_London · Feb 17, 2022

Thanks Colin for all your work. I am surprised and will have to change my thinking.

happy for the data to be uploaded

isladogs · Feb 18, 2022

I'm also surprised by the result & I don't have an explanation for it other than the evidence from JET ShowPlan.
I will add these tests to my website in the next few days.

These are the full set of results from the final set of tests:

@CJ_London has confirmed that his dataset is fine to share so I'm attaching the test database.
All the above results have been removed so feel free to test / adapt on your own PCs.
Don't be surprised if your times are faster. My PC is now 11 years old & definitely NOT Windows 11 compatible

arnelgp · Feb 18, 2022

there is nothing conclusive here.
if the test machine is Old, loaded with lots of software, connected to the internet.
as we all know computer has timers, each timer consume time, like checking the disk health, etc.
what if on your test, one method goes though without the timer intervening.
while the other, window is checking update on the background, thereby stalling it by a sec.
we don't know what events are being fired on the background while the test is on going.

neither the test was witnessed/certified by third party.

same as testing how long a ball will fall from 10 feet.
there are factors that will affect the rate, like strong winds, etc.

jdraw · Feb 18, 2022

Thanks Colin.
Just ran the tests. Consistent.

isladogs · Feb 18, 2022

arnelgp said:
there is nothing conclusive here.
if the test machine is Old, loaded with lots of software, connected to the internet.
as we all know computer has timers, each timer consume time, like checking the disk health, etc.
what if on your test, one method goes though without the timer intervening.
while the other, window is checking update on the background, thereby stalling it by a sec.
we don't know what events are being fired on the background while the test is on going.

neither the test was witnessed/certified by third party.

same as testing how long a ball will fall from 10 feet.
there are factors that will affect the rate, like strong winds, etc.

As is too often the case, when you reply to anything I've done, there's nothing constructive in your answer.
Sadly, you just seem to go out of your way to be negative just for the sake of it

All my tests are done in controlled conditions. They were also run repeatedly as you can see above.
The differences are small but consistent. You're welcome to try the test app yourself as @jdraw has done

Also, you picked a very poor analogy. True for a feather but in general not for a ball
A ball will fall at the same rate no matter what the wind speed.
Basic physics - vertical acceleration is completely independent of horizontal velocity

arnelgp · Feb 18, 2022

isladogs said:
A ball will fall at the same rate no matter what the wind speed.

it will not if the air is pushing it up?

isladogs · Feb 18, 2022

NEWS FLASH:
No new speed tests being conducted here today.
This is because the 70 mph + wind gusts we are currently experiencing in the UK due to Storm Eunice will obviously have an unpredictable effect on any results obtained.

jdraw · Feb 18, 2022

Good luck with the wind. We've had -15-20C here for several days; then 8C yesterday and rain with 30-40 kph winds; then back to -11C with a foot of snow (and wind).

NauticalGent · Feb 18, 2022

isladogs said:
NEWS FLASH:
No new speed tests being conducted here today.
This is because the 70 mph + wind gusts we are currently experiencing in the UK due to Storm Eunice will obviously have an unpredictable effect on any results obtained.

A friend of mine and his wife live in North Yorkshire which they moved to recently from Naples, Italy where I met them. He is thrilled because sitting inside drinking next to a warm fire suits him...

She on the the other hand has not quit belly-aching about the weather and just about anything else that comes to mind. Spoiled rotten first world brats...

Gasman · Feb 18, 2022

She might have a point.
A few leaves on the rails stops most trains.
Half inch of snow and the country grinds to a halt.
Other countries appear to cope?
If someone down my way so much as spits on the M4 motorway, the speed limit is reduced from 70mph to 50mph.

The_Doc_Man · Feb 19, 2022

I would run the tests on my home beast but it's Mardi Gras and there are too many distractions.

If someone down my way so much as spits on the M4 motorway, the speed limit is reduced from 70mph to 50mph.

In south Louisiana, we get snow/icy on the road once every 5-7 years. Last year, when we had a low level of sleet, our traffic slowed down to a crawl and several accidents occurred for people driving off of a straight, level road because they skidded on the ice. Not as bad as the snowstorm I endured in Ft. Worth TX in 2005, though. There, a major ice/snow storm came through and folks drove off the road so much that the tow truck operators had an average of 10 pick-up/tow tickets each. They were working past midnight. Us southerner drivers don't do so well on cold, icy roads.

NauticalGent · Mar 16, 2022

Mike Wolfe has an excellent explantion to these results...

Frothingslosh · Mar 16, 2022

Well, for what it's worth, on SQL Server, both queries almost always take the same amount of time and have the same outputs. This is because when SQL Server creates an execution plan, it converts SELECT DISTINCT into an aggregate query.

I'm currently using the following subquery in something I'm working on:

SQL:

SELECT DISTINCT   
    *
FROM
    dbo.tblCRMGroupSegmentExtract
WHERE
    MTVGroup IS NOT NULL
    AND MTVDivision IS NOT NULL

So I went ahead and converted it to this:

SQL:

SELECT
    CustomerID
    , CompanyName
    , MTVGroup
    , MTVDivision
    , GroupName
    , GroupEffectiveDate
    , GroupTerminationDate
    , ReportName
    , ReportDate
    , ReportDateKey
    , CycleDateKey
    , FileSrcName
FROM
    dbo.tblCRMGroupSegmentExtract
WHERE
    MTVGroup IS NOT NULL
    AND MTVDivision IS NOT NULL
GROUP BY
    CustomerID
    , CompanyName
    , MTVGroup
    , MTVDivision
    , GroupName
    , GroupEffectiveDate
    , GroupTerminationDate
    , ReportName
    , ReportDate
    , ReportDateKey
    , CycleDateKey
    , FileSrcName

Both versions, when run, take 7 seconds and return 102,682 records, with indexes needed on the columns in the WHERE clause but not currently there.

More importantly, both use identical execution plans:

Review of each step in each plan is identical. If you want, I can copy them and load them for you. I can also show you the actual results screenshots, but I would need to edit them first to remove certain field results.

It's not as thorough as Colin's testing, I'm afraid, but it's all I had time enough to do during lunch.

NauticalGent · Mar 16, 2022

Well look who has graced us with his presence today! I'm going outside this evening to see what a blue moon looks like...

Frothingslosh · Mar 16, 2022

NauticalGent said:
Well look who has graced us with his presence today! I'm going outside this evening to see what a blue moon looks like...

ROFL

If you only knew how busy I've been lately, and how little of it has been working with Access. Got promoted a couple times, and my workload is WAAAY heavier now than it used to be.

NauticalGent · Mar 16, 2022

Understood, but your absence has been noticed. Good to have you back!

isladogs · Mar 16, 2022

Hi Frothy. Great to see you again. Try not to be absent for so long in future ....
Thanks for answering the question posed by Axel Hayne re SQL Server. Would you mind posting a link to the above info on Mike's web article

For info, there is a more detailed version of these tests on my website:

Speed Comparison Tests 9

This article compares the efficiency of two methods of grouping data: SELECT DISTINCT vs GROUP BY. It is the ninth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding. Example databases are provided so the tests can be done on your...

www.isladogs.co.uk

NOTE: I've just posted a link from my own web article to your SQL Server results.

====================================================

For info, there are now a total of 11 sets of speed tests published on the website. See the full list at :

Speed Comparison Tests

This is a series of articles discussing various tests done to compare the efficiency of different approaches to coding. Example databases are provided so the tests can be done on your own workstations

www.isladogs.co.uk

Mythbusters - SELECT DISTINCT vs GROUP BY

MVP / VIP

Attachments

Lifelong Learner

MVP / VIP

Super Moderator

MVP / VIP

Attachments

..forever waiting... waiting for jellybean!

Super Moderator

MVP / VIP

..forever waiting... waiting for jellybean!

MVP / VIP

Super Moderator

Ignore List Poster Boy

Enthusiastic Amateur

Immoderate Moderator

Ignore List Poster Boy

Premier Pale Stale Ale

Ignore List Poster Boy

Premier Pale Stale Ale

Ignore List Poster Boy

MVP / VIP

Similar threads

Users who are viewing this thread