JOIN versus WHERE (1 Viewer)

ebs17

Well-known member
Local time
Today, 10:23
Joined
Feb 7, 2020
Messages
2,208
The topic is a partial take from another topic.

SQL:
SELECT <bunch of fields>
FROM A INNER JOIN B ON ( A.K1 = B.Kappa1) AND ( A.K2 = B.Kappa2 )
WHERE <some criteria>

SQL:
SELECT <bunch of fields>
FROM A, B
WHERE ( A.K1 = B.Kappa1 ) AND ( A.K2 = B.Kappa2 ) AND <some criteria>

@The_Doc_Man said: "but the latter method is less efficient"

@ebs17 said: "I would like to contradict that. Both variants work and should produce the same execution plan.
The presence of indexing does not play a role in functionality, but it does play a role in query execution performance (in both cases)."

@The_Doc_Man said: "Depending on indexing, maybe NOT the same plan. Perhaps you don't need an index, but EVERY article I look up to answer the question of whether or not an index is needed says, "In practical terms, yes." (In technical requirements, no.) I.e. you'd be very sorry if you didn't have one.

But the two queries ARE different in one major way. If you look up the "Order of SQL execution" you would find that the JOIN and FROM clauses are processed first, FOLLOWED by any WHERE clauses. The JOIN version reduces the number of records to be processed by the WHERE clause (if any) whereas the non-JOIN version has a Cartesian JOIN set for the WHERE to process. So if there ARE extra criteria for the WHERE clause (in the non-JOIN example), they have more work to do. The WHERE clause in the JOIN example has already had a reduction in the potential size of the return set."

If you look up the "Order of SQL execution" you would find that the JOIN and FROM clauses are processed first, FOLLOWED by any WHERE clauses.
There can be more criteria expressions in a JOIN clause and a WHERE clause than in the examples shown. In general, these expressions are initially treated equally and a different order may result.
The SQL optimizer determines the actual order using its own algorithms.
Lower efficiency would ultimately result in a longer query runtime. In my own measurements I find approximately the same running times.

The JOIN version reduces the number of records to be processed by the WHERE clause (if any) whereas the non-JOIN version has a Cartesian JOIN set for the WHERE to process.
JOIN and WHERE perform comparisons between field values. For these comparisons, data from the tables, determined by the complete FROM part, must be loaded.
When loading table data and before the comparison, you are IN ANY CASE dealing with an instantaneous CROSS JOIN.

The JOIN version reduces the number of records
We all know that a JOIN can also have a duplicative effect. That depends on the real table data.
 
Last edited:
The SQL optimizer determines the actual order using its own algorithms.
This is the key statement here.
It is the business of the database engine and the query optimizer to find the most efficient plan to execute a certain query. The strategy (algorithm) they use to find this plan is usually not publicly documented and might change with any update to the database engine.

For some database systems (e.g. MS SQL Server), the execution plans are quite expressive and would allow us in theory to "reverse engineer" the algorithm used to derive at the actual execution plan. Unless we have this info and go to the lengths of deducting the algorithm, we can only compare very specific queries with each other, but can hardly arrive at a general conclusion about query execution.
 
Didn't @isladogs huge timing database resolve this issue once and for all????
 
Further, I often pause to think about whether I want:

select
*
from
Table1 t1
inner join Table2 t2 on t1.column = t2.column and t2.othercolumn='something'

or


select
*
from
Table1 t1
inner join Table2 t2 on t1.column = t2.column
where
t2.othercolumn='something'

(in t-sql)

I think a lot of it comes down to readability. In the specific context of the tables and what seems to make more mental sense.
I do care about performance, but I care less about a very small performance difference than I do a small readability difference.

It all adds up performance wise, but it all adds up when you're training your replacement and either leaving a positive legacy or an awful one, too
 
The larger the tables, the more sorry you will be that you created a Cartesian product instead of a proper join.
 
in sql server at least, it's also something to think about the order of operations of a query ... would you really want to trick the optimizer into thinking there's no practical join happening and then start filtering out based on the where? Granted, since just after the dawn of time the optimizer is obviously smart enough to make this a moot point (I think), but still, going toward what Pat said a little bit, I'd hate to tempt it.

the reason i keep bringing up sql server instead of access is because i think the multiple ways of representing this data is much more likely something that people will try doing in sql server rather than Access, where 99.9 % of all people will simply join it and put the where in the where clause (regarding my add-on). (regarding the OP, i suppose either way might be tried a good part of the time, though I've no idea why)
 
The larger the tables, the more sorry you will be that you created a Cartesian product instead of a proper join.
I don't think, I know any current DBMS where this is actually the case.
BTW: Even with a proper join, applying the where condition first is the more efficient way to process many queries. - Query optimizers are usually aware of this.
 
It would appear that my original comments stoked up a firestorm of sorts. How silly of me to actually believe what I have read in print about the order in which SQL elements are processed!
 
It would appear that my original comments stoked up a firestorm of sorts. How silly of me to actually believe what I have read in print about the order in which SQL elements are processed!

because with modern query optimizers, that's largely theoretical. it's a caveat that ate a rule long ago
the optimizer does just about whatever the heck it wants to nowadays, of course i'm exaggerating to make a point for all the grammar police circling the wagon right now
 
the optimizer does just about whatever the heck it wants to nowadays
Of course not that. It can only be optimized within the framework of the formulated SQL statement.
Code:
WHERE Left(FieldX, 4) = "abcd"             ' a reason for an additional normalization step
There can be no index usage with a calculated field, and no optimizer will be able to bend that.
Assuming you want optimally executed SQL queries, or you even need them (existentially), then you should know how SQL works, even if you use assistants like QBE, because your own specifications and your own selection influence everything.
 
Exaggerating into nonsense is of course a sign that the world needs. Let's see who bows in gratitude.
 
Wow, you woke up grumpy today Ebs! Sorry to hear that, and I hope your day continues to get better from here. May you find peace
 
If I can call nonsense nonsense and I don't have to apologize for it, every day will be a good day.
 
How silly of me to actually believe what I have read in print about the order in which SQL elements are processed!
I think the discrepancy results from two different viewpoints.
There are quite a few sources on the web which support your "Order of SQL execution". However, these texts were written to give a human developer, trying to write or understand SQL, a conceptual guide on how to his query is going to be processed. They are not meant to accurately describe the actual order of internal processing by the database query engine.
 

Users who are viewing this thread

Back
Top Bottom