Joining nulls

jaryszek · Jun 21, 2020

o wow thank you Guys!!!! i understand!!!!

If you have a muItiple fields that may be null and you want to be able to match on them all together at once, you should not first UPDATE the Nulls to an out of range value, instead you should make a query and create a calculated field that creates a matching string. To match Table1 and Table2 like you want you should first make 2 queries:

this is very nice method. So plog you are not recommending to update table like i did just create matching keys, and the The_Doc_Man is recommending to do update to some specific values?

The difference is that using a JOIN, you MUST have an index on whichever table is the ONE side of a ONE/MANY relationship, and Access will use that fact to make a faster combination set. If you have no unique index on the ONE side of the relationship then you have to use the WHERE clause to do what is called a relation scan, which is terribly slow - and perhaps slightly more likely to introduce bloat in the FE file.

Ooo this is the tip. So while creating tables the most important thing is to create Index and if you have index you can use statement " FROM A JOIN B on A.X = B.X " instead of left join with WHERE clause? So where statememrnt and leftjoin will be much more slower then " FROM A JOIN B on A.X = B.X " with index unique fields? The_Doc_Man so if you have no index and will use this relation scan method you will update still the tables to avoid nulls?

Best Wishes,
Jacek

The_Doc_Man · Jun 21, 2020

The JOIN (on keys, which means you have indexes) is almost always faster than a WHERE (without keys or indexes).

Yes, if you are creating tables AND you wish to do multiple searches on various fields, you get the most benefit from having indexes on the most common fields that will be searched. This is one of the top "performance tips" that I can give you. Searching and indexes go together like meat and potatoes. (Or in south Louisiana, "red beans and rice").

The method plog suggested might be better than changing the values. Just remember that when you are comparing things, some of which involve nulls, you are looking at cases where you have no data AT ALL. The field concatenation method suggested by plog is probably better than anything else because it doesn't involve updating that table. You do the concatenation in the query (on the fly, so to speak) so (a) no previous update means no inherent database bloat (which is caused by massive updates), and (b) the fewer modifications you make to the table, the faster you will get around to your real problem. I.e. you import the raw data and then, with those queries in place, you can begin your analysis IMMEDIATELY because the queries will do what is needed at the time you need it.

jaryszek · Jun 21, 2020

Thank you very much!

The JOIN (on keys, which means you have indexes) is almost always faster than a WHERE (without keys or indexes).

So in my example from post #16 if i had not nulls and created indexes on field1 and field2 in both tables i would use join without WHERE clause?

And if i woud not have nulls there but also no indexes also i could use Inner join instead of left join with where clause?

Best,
Jacek

The_Doc_Man · Jun 22, 2020

You have two questions there.

Case: No nulls, but had indexes - yes, you could use a JOIN. It is a requirement of Access that if you want to do JOIN, you must have an index. If you have one, you could perhaps do an index. Normally, if you have an index and can build a formal relationship, you can do a JOIN.

Case: No nulls, no indexes - no, without an index, JOINs are not allowed. whether INNER, LEFT, or RIGHT. But the WHERE clause would work with or without indexes.

jaryszek · Jun 22, 2020

Ok,

thank you.
I added indexes to Field1 and Field2 in both tables (Table1 and Table2) and no have nulls:

This is working:

1. but when i try to use only JOIN (without INNER string) i am getting error, why?

2. When i do not have indexes on Field1 and Field2 in Table3 and Table4 i can use left join without clause WHERE:

So this is allowed? Or not ?

Please explain,

cheekybuddha · Jun 22, 2020

The_Doc_Man said:
It is a requirement of Access that if you want to do JOIN, you must have an index.

I don't think that's true.

You need to have an index to create a relationship (Access will create one for you if necessary), but you can join on unindexed fields in a query provided they are of the same datatype.

hth,

d

The_Doc_Man · Jun 22, 2020

@jaryszek - Your #2 screenshot in post #27 does not have the word INNER but it DOES have the word LEFT (in lower case).

A LEFT JOIN is one of the two valid OUTER joins. An INNER join only returns records where both sides of the JOIN match up. With either flavor of OUTER join (signified by either LEFT or RIGHT but NOT the actual word OUTER), the table named on the LEFT (or RIGHT) of the JOIN is the "independent" table and you see all of its records. And when the "dependent" table has no matching record, you get nulls in those slots that would have been filled by a matching dependent record. If you compare INNER and OUTER joins and they get different results, then you have unmatched records between the two tables.

You say when you leave out INNER you get an error. But what is the error? Are we talking about an actual error pop-up? Perhaps you get the wrong results? Please explain what you expected and what you actually got.

@cheekybuddha - I was taught that you needed the index for a JOIN, but if things have changed since that teaching occurred, I really cannot say that I am particularly surprised. We ARE talking about third-party instruction from many years ago.

jaryszek · Jun 24, 2020

thank you The_Doc_Man.

ok it is confusing a lot for me.

You can join on not indexes fields. and you do not have to use where clause then.
What is the difference then?

and error is:

so JOIN is not working but if i understand this correctly:
1. using indexes can fasten using all joins
2. JOIN will not work in access, only inner join but they are different
3. Where clause can make slower your queries but i do not catching how

Please help,
Jacek

The_Doc_Man · Jun 24, 2020

OK, it is confusing. Here is my shot at explaining things taking into account that cheekybuddha says you don't need an index to run a JOIN.

First, indexing vs. no indexing -

When a database has data that will be searched for any purpose, what Access does is it makes an internal list of the records matching the data to be presented, updated, deleted, or whatever the query was meant to do. It does the "location" part of the operation BEFORE it does any other action. This concept is incredibly important. Access finds records before acting on them.

The clunky way to do this is to read every record one at a time to see if what you wanted was there. This is called a relation scan (or a linear scan) and potentially takes a very long time. It takes a long time because typically the raw record being searched is fairly long, and Access reads things one disk buffer at a time, I believe in 4 KB gulps. The longer your record, the fewer records fit into the single buffer in use. This is because typically your system doesn't bring the entire back-end into memory at once. So it has to wade through the records one buffer-load at a time to build the list. It doesn't matter how much RAM you have because the operating system does its I/O buffer-oriented, just like every O/S I have ever seen since 1968. So like reading pages in a book, you read them one page at a time.

When you put an index on a field that you will search frequently, Access makes a hidden second table that lists two things - the field(s) in the index and the record number/location of the whole record containing that value. I said "field(s)" since there is such a thing as a single index with multiple fields. There are limits on the number of characters that can be indexed - 255, I believe - and you have a limited number of indexes on a single table - 32 if I remember that correctly. But here is where the advantage comes in. Index records (which you can never directly see) are VERY SHORT compared to the whole record. Therefore, when Access is running a search on a particular field, it can pull in more index records into that disk buffer in a single disk read operation. I.e. bigger data gulps but specific to the field being searched. Back to the analogy of reading pages in a book, if the book has a topics index, you can look through the index first to see what pages to read. Unlike Access, you can SEE a book's index.

Therefore, where indexing shines is that you take less time to find the records (pages) of interest, the places where the action of the query can then focus. For short tables, this phase is fast. For longer tables, this phase is slow. Whether it is a linear scan or an index scan, it depends on the size of the table, and that is pretty much all you need to know about index vs. linear scans.

I'm going to break here because a thunderstorm is overhead and I don't want a bunch of typing to get lost if I lose power. So this is the installment on the value of indexing. I'll do JOIN vs WHERE next.

The_Doc_Man · Jun 24, 2020

OK, next is JOIN vs WHERE

The difference between JOINs and using a WHERE clause is trickier in one way, but it is again related to finding records to be acted upon. BOTH methods ultimately relate to finding the correct records within the set of all records in your data sources.

One trivial difference is that a WHERE clause can be used in almost any case you can name including a single-table search. A JOIN is always used when you are looking at two sources of records (or more). So let's not worry about the single-table case, it leads to a boringly quick discussion.

I'll focus on the two-source case for simplicity. It can be two different tables or it can be two views of the same table (informally called a self-JOIN because the ANSI SQL standard doesn't call out anything special about that case.) You have to also remember this: Access and every other SQL engine in the world that follows the ANSI standard is based on set theory. Set theory is all about combinations and permutations.

Access cannot proceed with any other action until it makes a list of all of the records to be affected by that action, even including the simple case of building a datasheet for display purposes. This becomes the set of records of interest. The syntax that identifies tables is used to limit the scope of the set of interest. The more tables (or queries) you name as sources, the bigger the scope. Then you get to syntax related to reducing the size of that set so that it won't take forever.

In the JOIN case, you have SELECT bunch-of-fields FROM A INNER JOIN B on A.field = B.other-field WHERE.... and the idea is that you are taking two sets of records to create a single virtual record formed by combining those two sources. The JOIN clause tells Access how to quickly put together the two souces - in this case by finding equality between specific fields in the two different sources. The JOIN syntax tells Access to consider the joined virtual record as a unit for the purposes of what you are trying to do. It alleviates the need for you to write code that does OpenRecordset on the two sources and to make a programmatic double-loop scan to find matching records.

But there is another way to do this same thing. You can write SELECT bunch-of-fields FROM A, B WHERE ( A.field = B.other-field ) AND ... with the expressed purpose that again you will eventually get the same exact set of records as the JOIN. Nobody said that there was only one way to do a given job in Access. Sometimes there IS only one way. This isn't one of those times. But there IS a danger here. If you chose to use the WHERE method and forgot the (A.field = B.other-field) portion, you invite something called a Cartesian JOIN. This nomenclature reveals that in fact the two methods (JOIN vs. WHERE) are related.

A Cartesian JOIN (also called a permutation JOIN) occurs because this is done via SET theory. (Remember that comment from above?) The way that the syntax works, that FROM A,B clause says take EVERY record from set A and combine each record in A with EVERY record from set B. So you get records A1B1, A1B2, A1B3, A1B4, ... A2B1, A2B2, A2B3, A2B4, ... AnB1, AnB2, ... AnBn... EVERY COMBINATION in the sets of records A and B. Then the WHERE clause gets to filter out non-matching cases, which is what WHERE clauses do.

The power of the JOIN is that it allows you to tell Access how to more quickly reduce the set of records. And this is important because it turns out that order of operation matters - a LOT. This article might help.

SQLBolt - Learn SQL - SQL Lesson 12: Order of execution of a Query

SQLBolt provides a set of interactive lessons and exercises to help you learn SQL

sqlbolt.com

In it, you learn that SQL performs gathering actions first, and that specifically includes JOINs as a way of determining the records in question. The filtration implied by a WHERE clause comes SECOND in the list of internal SQL actions. Indeed, you CAN have both JOIN and WHERE clauses in a query. They just aren't implemented at the same time.

The WHERE clause in a JOIN (because you can have both elements) has less data to consider because the JOIN syntax allowed earlier elimination of record combinations that didn't match the JOIN criteria. And the key there is that if you have fewer records on which to work, everything is faster.

Now, getting back to the other part of the original question, if you have indexed the fields on which the JOIN is to occur, Access has the ability to make its JOIN evaluation based on index scans rather than linear table scans. Which is where indexes on fields to be JOINed become so valuable. We can easily say that either syntax (JOIN vs. WHERE) will eventually reduce the data set to whatever was intended. But how long will that take? The answer depends on (a) having indexes on the fields being used for selection criteria and (b) having JOINs in play to more rapidly reduce the size of the set of records to be considered for subsequent operations.

If you look at the linked article, you realize we still haven't touched on things like GROUP BY clauses, HAVING clauses, ORDER BY clauses, and the DISTINCT keyword.

In summary, what good is an indexed JOIN? It gets you more quickly to the part where your query actually starts working on the correct list of data.

The_Doc_Man · Jun 24, 2020

Now, you also commented in post #30 on INNER and OUTER joins and said you didn't think they worked. Different SQL engines use different syntax for this but I assure you that both kinds of JOINs work; you just have to know what you are asking them to do.

Remember in my post on JOIN vs. WHERE, I mentioned the importance of set theory. The INNER and OUTER joins look at sets differently. First, let's talk about the INNER JOIN. An INNER JOIN says "find all records from two sources where they have a particular VALUE in common between two fields, one from each source. Don't show men any records if the values don't match." In set theory, the set is the INTERSECTION of two recordsets.

This is the typical method of pulling parent/child records together. For instance, take a doctor's office case of a patient and visits. The syntax might be something like SELECT ... FROM patient INNER JOIN visits ON patient.PatID = visit.PatID... GROUP BY patient.PatID , and what you would get is records with patient information and visit information, gathered together according to the patient's internal ID number. This is also how you do inventories in a transaction-based setup. It is a work-horse way of merging data from two tables.

There is also the OUTER JOIN concept, but that depends on a very important concept. In an OUTER JOIN, one of the two tables is considered so important that you want to see every record even if there is no matching record in the other participating table. An INNER JOIN won't show you that because it requires matches from both tables. Access uses the syntax of LEFT JOIN or RIGHT JOIN to indicate which table is more important. You will also see these tables described as (rarely) primary and subordinate or (more commonly) independent and dependent. Let's look at the syntax.

You have SELECT .... FROM A LEFT JOIN B ON A.x = B.x or SELECT .... FROM A RIGHT JOIN B on A.x = B.x (but you should note that Access does not implement the third type of OUTER join called a FULL OUTER JOIN.) In the two cases above, the only difference is LEFT vs. RIGHT.

In the LEFT JOIN case you get all records from A and any records from B that match. In the RIGHT JOIN case you get all records from B and any records from A that match. Note that in some SQL implementations, the word JOIN in isolation is taken as INNER JOIN. However, Access SQL requires that you explicitly specify which of the three implemented JOINs you meant. So JOIN (by itself) doesn't work because you haven't qualified what KIND of JOIN you are requesting. And the set theory definition of LEFT or RIGHT JOIN is "the set of all records of the independent table, some of which might be matched from the dependent table." And here is where combinatorial concepts sneak in. If the dependent table has more than one matching record for a record in the independent table, you get TWO records, one for each case where a dependent record matched the same independent table's record.

What is the major effective difference between INNER and OUTER joins? Answer in brief: Nulls. The INNER JOIN will much less often return something as null and such returns are likely to be isolated cases. An OUTER JOIN will return nulls in many fields for every record in the independent table that is not matched by a record in the dependent table. From our doctor's office example, if you used a WHERE clause to show visits in a given year and one particular patient didn't visit that year, the INNER JOIN would omit the patient entirely. The OUTER JOIN with "patient" as the independent table would list the patient in a single record with nulls for any fields derived from the "visits" table.

The pitfall here is that those nulls are more difficult to use because Access makes a very clear and invariant rule: Nulls cannot be compared to nulls in order to find equality. Even in records where the meaning of that null is identical, null is always NOT EQUAL to another null. I said "where the meaning is identical" but let's be clear that the meaning of that null is the Access meaning - NO DATA AVAILABLE. You can make NO MATCHES between two null values because nulls never equal anything including themselves. PERIOD. FULL STOP. END OF SENTENCE.

People often try to use nulls in some tricky way as an extra "value" but the problem is that it cannot be compared as such. It can only be detected as having a value or having no value (i.e. null). That is why you have functions like ISNULL(x) or NZ(x,y) - to intercept nulls. That is why SQL has special syntax to detect nulls - to PREVENT you from attempting to use nulls for comparisons. And that is why LEFT JOIN and RIGHT JOIN will more often cause you headaches.

isladogs · Jun 24, 2020

@The_Doc_Man
In fact, from Access version 1.0 onwards, you have NEVER needed to have indexes on join fields in a query.
I was fairly certain of this but rechecked using my copy of Access 1.0 on an VM.

Of course, indexing will vastly improve performance as already stated but it has never been a requirement.

Incidentally, the screenshot shows a useful feature that existed in Access 1.0 but is no longer available.
Can others see what I am referring to here?

Deleted member 147267 · Jun 24, 2020

isladogs said:
you have NEVER needed to have indexes on join fields in a query.

You mean a SELECT query? IIRC, sometimes you cannot UPDATE without a unique indexed field?

isladogs · Jun 24, 2020

Whilst I was referring to a SELECT query, for UPDATE and DELETE queries involving two or more joined tables, the issue is to ensure unique records:
i.e. use UPDATE DISTINCTROW or DELETE DISTINCTROW.
Indexing is still not required though can at times be beneficial.
Having said that ,whilst indexes make select queries far faster, they actually significantly slow down action queries as the index itself needs to be updated.
See http://www.mendipdatasystems.co.uk/speed-comparison-tests-7/4594524997

Isaac · Jun 25, 2020

Pat Hartman said:
SQL Server is a little more picky and it also requires the lengths to be the same

Sorry, but that's incorrect. Different lengths of text type fields can be joined and, even different datatypes, as long as they can be implicitly converted.

jaryszek · Jun 25, 2020

Thank you very much for very nice explanations, especially thanks for The_Doc_Man!
I will sit on the weekend and learn from you more!

The_Doc_Man · Jun 25, 2020

@isladogs - as I alluded to, my training on that topic was years ago and and I can't say we spent a lot of time on it. So if you don't really need the index, fine. Regarding the "useful feature" ... Were you referring to the now-obsolete user access control option in that query? Or was it the odd syntax for that join that in modern syntax appeared to do both a Cartesian and a more normally regulated JOIN of the tables?

And I have to admit that my first version of Access was not 1.0, but 2.0, because my earlier DB work was based on Borland Paradox for DOS. I was a "convert" because Borland Paradox for Windows screwed the pooch with the way they did things.

isladogs · Jun 25, 2020

Hi Doc
I hated Paradox for Windows and it put me off database work for several years.

You're right that the syntax was odd but that wasn't what I was alluding to.
Nor did I mean WITH OWNERACCESS OPTION
In version 1.0 you could see both SQL view and query design view simultaneously.
Not sure when that feature was lost but it would be useful to have it back.
Having said that I didn't try altering the query design or SQL whilst both windows were open. No idea whether they would remain in sync.

Joining nulls

Registered User.

Immoderate Moderator

Registered User.

Immoderate Moderator

Registered User.

Attachments

AWF VIP

Immoderate Moderator

Registered User.

Immoderate Moderator

Immoderate Moderator

Immoderate Moderator

Access MVP / VIP

Deleted member 147267

Guest

Access MVP / VIP

Lifelong Learner

Registered User.

Immoderate Moderator

Access MVP / VIP

Similar threads

Users who are viewing this thread