Mythbusters - SELECT DISTINCT vs GROUP BY (1 Viewer)

isladogs

MVP / VIP
Local time
Today, 01:07
Joined
Jan 14, 2017
Messages
18,476
Hi all
I updated my Mythbusters: WHERE vs HAVING thread yesterday

When I did so, @CJ_London reminded me of a previous request to test the following:
We often see OP's using aggregate queries without any aggregation rather than SELECT DISTINCT - do you have an example to compare a SELECT DISTINCT v GROUP BY with perhaps a criteria or two?

I must confess I'd never thought about comparing these before but did so this morning
I asked the following question both in the thread & in a PM to @CJ_London :
what are your predictions before I make the results public?
Before I started, I had a gut feeling that SELECT DISTINCT would be slightly faster though I couldn't have given any meaningful reason for that
All those who responded tended to agree.... BUT the opposite was true ...

I did various tests on 2 different datasets and on each test (without exception), GROUP BY was slightly faster.
The differences were relatively small but they were consistent in every test

First of all, I tested my Postcodes dataset (2.6 million records):
In my first set of tests, I ran the following queries:

qryDistinct
Code:
SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes;

qryGroupBy
Code:
SELECT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector;

Each gave exactly the same output of 12415 records.

All fields in the queries were indexed
I looped through each query 3 times and measured the total time in each case.
I repeated each test & calculated the averages:
1645131071160.png


I then filtered the data when running each query to see if that would change the results. Same outcome...
1645131470590.png


I then repeated the tests, but this time appended the query output into a temp table each time to ensure that the query had been loaded fully during the tests.
Same outcome ....
1645131256122.png


I ran JET ShowPlan on each of the queries. The results were almost identical:

1645132032034.png


Next I tried using a different dataset supplied by @CJ_London.
It was a smaller dataset of approx 150K records so I looped through each query 20 times.
This time, there were 2 non-indexed fields in the queries
Once again the two queries gave identical outputs = 8324 records

Did that affect the outcomes? NO

1645131655270.png


Next I ran Jet ShowPlan on each query. The execution plans were almost identical:

1645131843570.png


NOTE:
The 'by scanning' reference' indicates that Access had to scan the entire dataset in each case due to the inclusion of non-indexed fields

I can't upload my Postcodes dataset as the ACCDB file is about 1.6 GB
However, I will upload the example app with @CJ_London's dataset if he confirms its OK to do so

==========================================================================

For more info on Jet ShowPlan, see my article:

Also see this excellent article by Mike Wolfe:
 

Attachments

  • 1645131362522.png
    1645131362522.png
    13.6 KB · Views: 426
Last edited:
All those who responded tended to agree.... BUT the opposite was true ...
Except one :)
But I can't claim much credit for it, as I was only going by sql server precedent, which isn't really precedent at all, just a guess.
 
I couldn't work out exactly what your position was ....
 
Thanks Colin for all your work. I am surprised and will have to change my thinking. 😊

happy for the data to be uploaded
 
I'm also surprised by the result & I don't have an explanation for it other than the evidence from JET ShowPlan.
I will add these tests to my website in the next few days.

These are the full set of results from the final set of tests:

1645181114506.png


@CJ_London has confirmed that his dataset is fine to share so I'm attaching the test database.
All the above results have been removed so feel free to test / adapt on your own PCs.
Don't be surprised if your times are faster. My PC is now 11 years old & definitely NOT Windows 11 compatible
 

Attachments

there is nothing conclusive here.
if the test machine is Old, loaded with lots of software, connected to the internet.
as we all know computer has timers, each timer consume time, like checking the disk health, etc.
what if on your test, one method goes though without the timer intervening.
while the other, window is checking update on the background, thereby stalling it by a sec.
we don't know what events are being fired on the background while the test is on going.

neither the test was witnessed/certified by third party.

same as testing how long a ball will fall from 10 feet.
there are factors that will affect the rate, like strong winds, etc.
 
Thanks Colin.
Just ran the tests. Consistent.
speedCompare_18Feb22.PNG
MyLaptopSpec.PNG
 
there is nothing conclusive here.
if the test machine is Old, loaded with lots of software, connected to the internet.
as we all know computer has timers, each timer consume time, like checking the disk health, etc.
what if on your test, one method goes though without the timer intervening.
while the other, window is checking update on the background, thereby stalling it by a sec.
we don't know what events are being fired on the background while the test is on going.

neither the test was witnessed/certified by third party.

same as testing how long a ball will fall from 10 feet.
there are factors that will affect the rate, like strong winds, etc.

As is too often the case, when you reply to anything I've done, there's nothing constructive in your answer.
Sadly, you just seem to go out of your way to be negative just for the sake of it

All my tests are done in controlled conditions. They were also run repeatedly as you can see above.
The differences are small but consistent. You're welcome to try the test app yourself as @jdraw has done

Also, you picked a very poor analogy. True for a feather but in general not for a ball
A ball will fall at the same rate no matter what the wind speed.
Basic physics - vertical acceleration is completely independent of horizontal velocity
 
NEWS FLASH:
No new speed tests being conducted here today.
This is because the 70 mph + wind gusts we are currently experiencing in the UK due to Storm Eunice will obviously have an unpredictable effect on any results obtained. :LOL:
 
Last edited:
Good luck with the wind. We've had -15-20C here for several days; then 8C yesterday and rain with 30-40 kph winds; then back to -11C with a foot of snow (and wind).
 
NEWS FLASH:
No new speed tests being conducted here today.
This is because the 70 mph + wind gusts we are currently experiencing in the UK due to Storm Eunice will obviously have an unpredictable effect on any results obtained. :LOL:
A friend of mine and his wife live in North Yorkshire which they moved to recently from Naples, Italy where I met them. He is thrilled because sitting inside drinking next to a warm fire suits him...

She on the the other hand has not quit belly-aching about the weather and just about anything else that comes to mind. Spoiled rotten first world brats...
 
She might have a point.
A few leaves on the rails stops most trains.
Half inch of snow and the country grinds to a halt.
Other countries appear to cope?
If someone down my way so much as spits on the M4 motorway, the speed limit is reduced from 70mph to 50mph.
 
I would run the tests on my home beast but it's Mardi Gras and there are too many distractions.

If someone down my way so much as spits on the M4 motorway, the speed limit is reduced from 70mph to 50mph.

In south Louisiana, we get snow/icy on the road once every 5-7 years. Last year, when we had a low level of sleet, our traffic slowed down to a crawl and several accidents occurred for people driving off of a straight, level road because they skidded on the ice. Not as bad as the snowstorm I endured in Ft. Worth TX in 2005, though. There, a major ice/snow storm came through and folks drove off the road so much that the tow truck operators had an average of 10 pick-up/tow tickets each. They were working past midnight. Us southerner drivers don't do so well on cold, icy roads.
 
Well, for what it's worth, on SQL Server, both queries almost always take the same amount of time and have the same outputs. This is because when SQL Server creates an execution plan, it converts SELECT DISTINCT into an aggregate query.

I'm currently using the following subquery in something I'm working on:
SQL:
SELECT DISTINCT   
    *
FROM
    dbo.tblCRMGroupSegmentExtract
WHERE
    MTVGroup IS NOT NULL
    AND MTVDivision IS NOT NULL

So I went ahead and converted it to this:
SQL:
SELECT
    CustomerID
    , CompanyName
    , MTVGroup
    , MTVDivision
    , GroupName
    , GroupEffectiveDate
    , GroupTerminationDate
    , ReportName
    , ReportDate
    , ReportDateKey
    , CycleDateKey
    , FileSrcName
FROM
    dbo.tblCRMGroupSegmentExtract
WHERE
    MTVGroup IS NOT NULL
    AND MTVDivision IS NOT NULL
GROUP BY
    CustomerID
    , CompanyName
    , MTVGroup
    , MTVDivision
    , GroupName
    , GroupEffectiveDate
    , GroupTerminationDate
    , ReportName
    , ReportDate
    , ReportDateKey
    , CycleDateKey
    , FileSrcName

Both versions, when run, take 7 seconds and return 102,682 records, with indexes needed on the columns in the WHERE clause but not currently there.

More importantly, both use identical execution plans:
AWFExecutionPlan.jpg

Review of each step in each plan is identical. If you want, I can copy them and load them for you. I can also show you the actual results screenshots, but I would need to edit them first to remove certain field results.

It's not as thorough as Colin's testing, I'm afraid, but it's all I had time enough to do during lunch.
 
Well look who has graced us with his presence today! I'm going outside this evening to see what a blue moon looks like...
 
Well look who has graced us with his presence today! I'm going outside this evening to see what a blue moon looks like...
ROFL

If you only knew how busy I've been lately, and how little of it has been working with Access. Got promoted a couple times, and my workload is WAAAY heavier now than it used to be.
 
Understood, but your absence has been noticed. Good to have you back!
 
Hi Frothy. Great to see you again. Try not to be absent for so long in future ....
Thanks for answering the question posed by Axel Hayne re SQL Server. Would you mind posting a link to the above info on Mike's web article

For info, there is a more detailed version of these tests on my website:

NOTE: I've just posted a link from my own web article to your SQL Server results.

====================================================

For info, there are now a total of 11 sets of speed tests published on the website. See the full list at :
 
Last edited:

Users who are viewing this thread

Back
Top Bottom