SELECT DISTINCT vs GROUP BY

sseto23

Registered User.
Local time
, 20:13
Joined
Oct 13, 2011
Messages
24
Hi Everyone, this might seem like a trivial question, but I have a query called QRY_A which returns about 5000 rows with fields a, b, c, x, y and z. If I want to get just the unique rows from the results, would it be more efficient for me to use:

SELECT DISTINCT a, b, c, x, y, z
FROM QRY_A

or

SELECT a, b, c, x, y, z
FROM QRY_A
GROUP BY a, b, c, x, y, z

Also could I use SELECT DISTINCTROW and would that be more efficient? Is there any other way of getting the unique records other than the ways I've asked about?

Thanks for taking the time to read my question and for your responses. :)
 
DISTINCTROW looks at your data source so since you're including all the fields from the query and you're using one query, DISTNCTROW would be a good one to use. It's normally faster than DISTINCT. But note that DISTINCTROW is Access specific.

On the other hand, if you want to get distinct records based on the currenlty selected fields, then DISTINCT is your only option.

So to sum up, you're on the right track.
 
I tried the following SQL statement:

SELECT DISTINCTROW a, b, c, x, y, z
FROM QRY_A

but it gave me back some duplicate rows. Is that because my subquery QRY_A only returned a subset of the fields that were on the original tables?

I was hoping I could use DISTINCTROW instead of DISTINCT because it did run much faster, but unfortunately I don't think it can give me the output I need. :(
 
create a basic query, right-click the header, and set the property to "unique values"

then see what the SQL looks like.
 
Unfortunately changing the property to unique values instead of unique records will change the query from a SELECT DISTINCTROW to a SELECT DISTINCT which is not what I wanted to do because the SELECT DISTINCTROW seems to run so much faster than the SELECT DISTINCT.:(
 
try unique records, rather than unique values in the query properties, then - but i don't think unique records prevents duplicates.it's a different concept
 
I went with using DISTINCT because I couldn't figure out any other way to get rows with unique values, but I restructured my queries in the following manner:

Original scenario:

qry_A - selected records from a join of multiple tables and performed a VBA function for each record
qry_B - selected records from qry_A results based on specific criteria
qry_C - selected DISTINCT records from qry_B results

This ran extremely long. I'm accessing our network remotely which is extremely slow so it took a ridiculous amount of time say around 20 mins or more

Rewritten scenario:

qry_A - selected DISTINCT records from a join of multiple tables
qry_B - performed a VBA function for each record from qry_A results
qry_C - selected records from qry_B based on specific criteria

This ran much quicker than the original scenario, about 5 mins

My question is why did the rewritten scenario run so much faster when putting the SELECT DISTINCT up front in qry_A only reduced the number of result records from qry_A from 38850 (original) to 38647 (rewritten)? Does it have to do with the placement of the VBA function before or after the SELECT DISTINCT???

(I don't think it matters, but the VBA function does some date calculations)

If you could shed some light on this, I would really appreciate it.:confused:
 
but probably - in the second case the function is being performed on a subset of records - in the first case it is certinly being performed on more items - maybe many more - and maybe you are generating a crossproduct which would make it worse.
 
but probably - in the second case the function is being performed on a subset of records - in the first case it is certinly being performed on more items - maybe many more - and maybe you are generating a crossproduct which would make it worse.
That's one explanation and a significant one too. Also, if the function is returning a different value everytime it's run it will cause the query to run more than once, in most cases twice.
 
Ah, that fun thread. I still haven't had the time to write up one. You will notice a signficant difference if the code was implementing a Dictionary or Collection object instead of an array. If I have some time later today I will code what I explained in that thread.

Anyway, the code is returning different values for each time it's called so the query will run at least twice and you can imagine it will take much longer to churn when presented with a larger dataset (like gemma-the-husky explained).
 
Sorry if I'm being dense here, but I'm not really sure what you mean by:

"Also, if the function is returning a different value everytime it's run it will cause the query to run more than once, in most cases twice"

Can you perhaps give me an example?

Thanks :o
 
Just took a look at the article, and I was a bit surprised. I would have thought that examples B and C would have both ran the function only once and would have guessed that the outputs would have been the same too. I wouldn't have thought that the function would have been run once to create the recordset and then again to apply the criteria. Very interesting :)
 
The practical implication here is that a query requiring records to be generated by a function then a criteria applied to those results would be better constructed with a subquery.

Code:
SELECT MyFunction(MyTable.MyField)
FROM MyTable
WHERE MyFunction(MyTable.MyField) = x
A better solution:
Code:
SELECT MyAlias.MyResult
FROM
    (
     SELECT MyFunction(MyTable.MyField) AS MyResult
     FROM MyTable
     ) AS MyAlias
WHERE MyAlias.MyResult = x

I have always assumed this was the case but has anyone actually held the race?
 

Users who are viewing this thread

Back
Top Bottom