Useful TIP-Using <Where> and <Having> in the same query. (1 Viewer)

raziel3 · Nov 9, 2023

Came across this little tip I never knew about

https://learn.microsoft.com/en-us/s...y-visual-database-tools?view=sql-server-ver16

very useful knowledge for people, like me, who uses aggregate queries a lot.

plog · Nov 9, 2023

Reader's Digested version:

WHERE - use when applying criteria to individual records
HAVING - use when applying criteria to an aggregation of records (MIN(), SUM(), COUNT(), AVG(), etc.)

...WHERE Table.FirstName = 'Jeff'
...WHERE Table.Quantity > 7

...HAVING SUM(Table.Quantity) > 100
...HAVING MAX(Table.Score) < 50

Pat Hartman · Nov 10, 2023

WHERE - Before aggregation
HAVING - After aggregation

When you make a totals query, Access ASSUMES you want a HAVING. It is up to you to change it to a WHERE if that is what you need.

If you are mistakenly using HAVING when you should be using WHERE, fixing the query could lead to a dramatic speed improvement.

ByteMyzer · Nov 13, 2023

Here is a more concrete illustration of the differences between HAVING and WHERE clauses.

Table Name: StudentTests

Code:

StudentID | TestID | TestDate
----------+--------+----------
10123     | 1012   |  3/1/2022
10123     | 1023   |  9/6/2022
10123     | 1035   |11/17/2022
10123     | 1057   | 2/27/2023
10135     | 1012   |  3/1/2022
10135     | 1023   |  9/6/2022

SELECT ST.StudentID, MAX(ST.TestDate) AS TestDate
FROM StudentTests AS ST
GROUP BY ST.StudentID;

Result:
StudentID | TestDate
----------+----------
10123     | 2/27/2023
10135     |  9/6/2022

Task: Write a query that shows, for tests that took place in 2022, each Student ID with the latest test date.

Code:

SELECT ST.StudentID, MAX(ST.TestDate) AS TestDate
FROM StudentTests AS ST
WHERE ST.TestDate BETWEEN #1/1/2022# AND #12/31/2022#
GROUP BY ST.StudentID;

Result:
StudentID | TestDate
----------+----------
10123     |11/17/2022
10135     |  9/6/2022

Task: Write a query that shows each Student ID and the latest test date for whom the latest test date occurred in 2022.

Code:

SELECT ST.StudentID, MAX(ST.TestDate) AS TestDate
FROM StudentTests AS ST
GROUP BY ST.StudentID
HAVING MAX(ST.TestDate) BETWEEN #1/1/2022# AND #12/31/2022#;

Result:
StudentID | TestDate
----------+----------
10135     |  9/6/2022

ebs17 · Nov 13, 2023

Without specifically testing @ByteMyzer's example, I'll say his statement is wrong, because different query logic was stored. Different logic, different results (depending on given data).

In such a simple design, with the same logic the result should always be the same, regardless of whether you use WHERE or HAVING.

The huge difference between the two lies in the order in which queries are evaluated and the resulting performance.

WHERE takes place immediately with the execution of FROM including JOINs and reduces the amount of data available. Only then does grouping take place. Grouping is a complex process because each record within the recordset under consideration must be compared with every other record to ensure that the content of the field or field combination is identical.
It makes a dramatic difference whether you use WHERE to reduce a set of, for example, 10,000 records to 300 and then group them or whether the grouping has to run across all 10,000 records; the number of comparison operations is significantly different.
When grouping, the support of an index would often be used and the comparison process mentioned would be accelerated again. Often, however, groupings are not just done using key fields or specifically indexed fields, but rather using a long list of fields where no index is guaranteed to be available.
A lot of work takes a lot of time.

HAVING then takes place after grouping and on aggregates formed.

Conclusion: With a simple operation (WHERE) you reduce the amount of data before executing a more complex operation and thus increase efficiency. This cannot happen when using HAVING.

isladogs · Nov 13, 2023

Adding on to the detailed explanation given by @ebs17, see my article comparing the efficiency (speed) of using WHERE vs HAVING

Speed Comparison Tests 4

This article compares the efficiency of using a WHERE clause or a HAVING clause in aggregate queries. It is the fourth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding. Example databases are provided so the tests can be done on...

www.isladogs.co.uk

For the differences to be significant, use indexing on the fields used in the filtering.

ByteMyzer · Nov 13, 2023

ebs17 said:
Without specifically testing @ByteMyzer's example, I'll say his statement is wrong, because different query logic was stored. Different logic, different results (depending on given data).

Why not test it, then, and prove me wrong?

ebs17 said:
In such a simple design, with the same logic the result should always be the same, regardless of whether you use WHERE or HAVING.

The result will not be the same between the two queries. As @Pat Hartman pointed out,

Pat Hartman said:
WHERE - Before aggregation
HAVING - After aggregation

In my example with:

Code:

WHERE ST.TestDate BETWEEN #1/1/2022# AND #12/31/2022#

...the criterion is applied to the dataset before the grouping takes place, so you will see the results for the two students, ONLY from the tests with the dates taking place in 2022.

In my example with:

Code:

HAVING MAX(ST.TestDate) BETWEEN #1/1/2022# AND #12/31/2022#;

The grouping takes place, with the MAX derivations on the TestDate field, and THEN the criterion is applied to the resulting dataset. So, because the maximum test date for Student ID 10123 is 2/27/2023, no result is returned in the HAVING query for this student.

Josef P. · Nov 13, 2023

WHERE - Before aggregation
HAVING - After aggregation

I would extend this with:
Having only for conditions with aggregate functions - everything else belongs in the Where section.

And now I'll add some confusion or clarity, depending on how you want to look at it

The DBEngine (Jet/ACE) makes a Where out of Having when creating the show plan, if the condition allows it, such as a filter on a certain value.

Example:
Query TestTab_GroupBy_Where:

SQL:

SELECT T, Count(*) AS Cnt
FROM TestTab
WHERE N = 1
GROUP BY T

ShowPlan output:

Code:

--- TestTab_GroupBy_Where ---

- Inputs to Query -
Table 'TestTab'
- End inputs to Query -

01) Restrict rows of table TestTab
      using rushmore
      for expression "N=1"
02) Group result of '01)'

Query TestTab_GroupBy_Having:

SQL:

SELECT T, Count(*) AS Cnt
FROM TestTab
GROUP BY T, N
HAVING N=1;

ShowPlan output:

Code:

--- TestTab_GroupBy_Having ---

- Inputs to Query -
Table 'TestTab'
- End inputs to Query -

01) Restrict rows of table TestTab
      using rushmore
      for expression "N=1"
02) Group result of '01)'

Nevertheless, I always recommend using Where for "normal" conditions, as this makes the query logic more visible.

/edit: Colin shows this well in his article. (Link in #6)

ByteMyzer · Nov 13, 2023

@Josef P.
What does the plan show for the following query?

SQL:

SELECT T, Count(*) AS Cnt
FROM TestTab
GROUP BY T, N
HAVING Count(*) > 1;

Josef P. · Nov 13, 2023

Unfortunately only Group table ... Showplan does not show the use of Having Count(*)>1.

Code:

- Inputs to Query -
Table 'TestTab'
- End inputs to Query -

01) Group table 'TestTab'

You can see this better in the SQL server:

Code:

SELECT T, Count(*) AS Cnt
FROM TestTab
WHERE N = 1
GROUP BY T
HAVING count(*)>1;


SELECT T, Count(*) AS Cnt
FROM TestTab
GROUP BY T, N
HAVING N=1 and count(*)>1;

=>

Code:

|--Filter(WHERE:([Expr1002]>(1)))
       |--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1005],0)))
            |--Stream Aggregate(GROUP BY:([TestDb].[dbo].[TestTab].[T]) DEFINE:([Expr1005]=Count(*)))
                 |--Index Seek(OBJECT:([TestDb].[dbo].[TestTab].[IX_N]), SEEK:([TestDb].[dbo].[TestTab].[N]=(1)) ORDERED FORWARD)

  |--Filter(WHERE:([Expr1002]>(1)))
       |--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1005],0)))
            |--Stream Aggregate(GROUP BY:([TestDb].[dbo].[TestTab].[T]) DEFINE:([Expr1005]=Count(*)))
                 |--Index Seek(OBJECT:([TestDb].[dbo].[TestTab].[IX_N]), SEEK:([TestDb].[dbo].[TestTab].[N]=(1)) ORDERED FORWARD)

Pat Hartman · Nov 13, 2023

Josef P. said:
Nevertheless, I always recommend using Where for "normal" conditions, as this makes the query logic more visible.

The two are used for different reasons and are not interchangeable. You use WHERE to select/eliminate detail records. This can reduce the aggregation process dramatically depending on the criteria.

HAVING is used on aggregated data, not on details. So, you would most commonly use it to find customers whose average order was > 1000 or Orders that were more than $400. In both examples, data needs to be aggregated before you can apply the criteria. When you are looking for Orders placed in Dec that were for more than $400, you need both a WHERE and a HAVING. The WHERE filters the details down to only orders placed in Dec. Then that smaller set of data is summed and the criteria applied. If you used only a HAVING rather than both, then the query engine would have to aggregate possibly 10's of thousands of orders over many years and then only at the end chose the ones from last December that were > $400. The WHERE lets you eliminate all records other than those from last Dec. Then you only have to sum the remaining records and use the HAVING to make the final selection.

The problem Access developers run into is that the QBE ASSUMES you need a HAVING when you are working with a totals query. That means YOU, the developer, MUST make a conscious determination that a WHERE is called for and change the HAVING to a WHERE. The QBE is not smart enough to figure out that a WHERE might be more efficient or that you actually should use both clauses.

ebs17 · Nov 13, 2023

Pat Hartman said:
The QBE is not smart enough

As Josef showed, the SQL optimizer is correspondingly smarter and can in many cases clean up the mess that the QBE and thus another Access component creates.
But I also wouldn't want to rely on errors being compensated for internally.

Pat Hartman said:
That means YOU, the developer, MUST make a conscious determination that a WHERE is called for and change the HAVING to a WHERE.

But that doesn't show any help or operating instructions. The real problem is that many believe and are encouraged to believe that the QBE produces decent SQL code that can be adopted without criticism - and there is no need to really bother with the actual SQL programming language.
I can only shake my head.

Pat Hartman · Nov 13, 2023

Why are you shaking your head? You think that Joseph proved that the Access query engine fixes flaws of the QBE so what's the problem with using the QBE? I disagree BTW. I just recently fixed a problem caused by using HAVING rather than WHERE for a poster. In Joseph's query, the N=1 criteria applied to a column that was not included in the aggregated data so yes, the engine is smart enough to fix that problem. The N data values would have been summarized away by the time that the HAVING ran so they would never have been available for the HAVING to operate on. The two queries may end up with the same plan because there aren't enough values of N in the index to make using the index first more efficient.

Colin has done extensive testing of queries using a very large set of data. It is only with large sets of data that you can base assumptions like the above on. If 90% of the N's are = 1, there is no point in using an index. I don't know what the break point is but there needs to be enough distinct values of N to be sure that using an index will be more efficient than a full scan after the aggregation.

ebs17 · Nov 14, 2023

Pat Hartman said:
so what's the problem with using the QBE?

1) An error is first generated that needs to be corrected.
2) The belief of many that the QBE does everything right, along with a quasi-encouragement to rely on this assistant and not really bother with the native SQL programming language. This “habit” is then carried on over and over again.

Useful TIP-Using <Where> and <Having> in the same query. (1 Viewer)

raziel3

Registered User.

plog

Banishment Pending

Pat Hartman

Super Moderator

ByteMyzer

AWF VIP

ebs17

Well-known member

isladogs

MVP / VIP

Speed Comparison Tests 4

ByteMyzer

AWF VIP

Josef P.

Well-known member

ByteMyzer

AWF VIP

Josef P.

Well-known member

Pat Hartman

Super Moderator

ebs17

Well-known member

Pat Hartman

Super Moderator

ebs17

Well-known member

Similar threads

Users who are viewing this thread