Select Distinct vs. Group By

mstorer

Registered User.
Local time
Today, 18:51
Joined
Nov 30, 2001
Messages
95
With the exception of having to write an additional line of SQL, are there any benefits or performance issues associated with using SELECT DISTINCT vs. GROUP BY. Am I correct in my assumption that both statements will return the same results? Assuming that I don't need to perform any Max, Min, Sum, etc. function in another fields, I find it is just as easy to click on the GROUP button as it is to change the properties of the query or physically type it in the SQL. Just a question of curiosity.

- Matt
 
These two statements are not related. The Group By is used when performing aggregate functions on a set of data. Let's say you had Sales people all over the country. You could then have a query that totaled all of the sales GROUPed BY, say, state, or district, where you had the GROUP By on the State field or the District field and had a SUM on the Sales field.

Now, the distinct clause just gives you that. A single record returned for a specified field or record. But the Access Help explains it well and I will include it here:

DISTINCT
Omits records that contain duplicate data in the selected fields. To be included in the results of the query, the values for each field listed in the SELECT statement must be unique. For example, several employees listed in an Employees table may have the same last name. If two records contain Smith in the LastName field, the following SQL statement returns only one record that contains Smith:

SELECT DISTINCT
LastName
FROM Employees;

If you omit DISTINCT, this query returns both Smith records.If the SELECT clause contains more than one field, the combination of values from all fields must be unique for a given record to be included in the results.The output of a query that uses DISTINCT isn't updatable and doesn't reflect subsequent changes made by other users.

DISTINCTROW
Omits data based on entire duplicate records, not just duplicate fields. For example, you could create a query that joins the Customers and Orders tables on the CustomerID field. The Customers table contains no duplicate CustomerID fields, but the Orders table does because each customer can have many orders. The following SQL statement shows how you can use DISTINCTROW to produce a list of companies that have at least one order but without any details about those orders:

SELECT DISTINCTROW CompanyName
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
ORDER BY CompanyName;

If you omit DISTINCTROW, this query produces multiple rows for each company that has more than one order.DISTINCTROW has an effect only when you select fields from some, but not all, of the tables used in the query. DISTINCTROW is ignored if your query includes only one table, or if you output fields from all tables.
 
Thanks for the extensive explanation pdx_man. I suppose I posed the question a little too generically. I understand how the aggregate functions work and why I use DISTINCT in my queries. Let's use the DISTINCT example that you provided and assume that we don't want to perform any aggregate function such as sum, last, max, etc. (I understand that DISTINCTROW is only similar when you don't query fields from multiple tables.) Once you have selected the fields for your query, wouldn't clicking on the GROUP button, which groups all of the selected fields in the query, produce the same output? I guess I was looking at it from a performance standpoint. Or perhaps I'm a little lazy and find clicking on the GROUP button easier than changing the properties of the query or typing DISTINCT in the SQL. Am I any clearer on what I'm asking? Thanks again for your insight.
 
Last edited:
Ah,
Either one is a dog as far as performance is concerned. So, you want to know which dog is slower? Well, it matters on how you have your indexes set up, and of course, the amount of data you're dealing with.

In my test here with 5 million rows resolving to about 1 million rows, using indexed fields, the Group By took about 10% longer to execute on SQL Server. How does the performance work with Access? I don't know ... conduct an experiment and post your results.

Just an FYI, I would never do a Group By query to substitute using DISTINCT. I have found that 'cute' little solutions are never just that. They always seem to come back and bite you in the @$$.
 
Hi Guys,

I am trying to get a distinct count of students that have had tutorials and group this by the department they are in. At the moment I am using the query below but this seems to count the total number of students in the tutorial table and not just pick out the distinct ones.

Code:
SELECT Count(*) AS CountOfStudentID, tbl_Student.Department
FROM ((SELECT DISTINCT StudentID FROM tbl_Tutorial)  AS T INNER JOIN tbl_Student ON T.StudentID = tbl_Student.StudentID) INNER JOIN tbl_Tutorial ON tbl_Student.StudentID = tbl_Tutorial.StudentID
WHERE (((tbl_Tutorial.Date) Between [Enter Start Date dd/mm/yyyy] And [Enter End Date dd/mm/yyyy]))
GROUP BY tbl_Student.Department;

Any help appreciated :)
 

Users who are viewing this thread

Back
Top Bottom