Joining nulls (1 Viewer)

Pat Hartman

Super Moderator
Staff member
Local time
Yesterday, 22:10
Joined
Feb 19, 2002
Messages
42,981
Visually Null and ZLS (Zero Length String) look the same. They are both "blank" and in fact Excel treats them the same way. However, relational databases do not consider them to be equal.

Create a query like this:

Select FieldInQuestion, Count(*) As RecCount
From YourTable
Group by FieldInQuestion
Order By FieldInQuestion

If you end up with two "blank" rows, you have some null values and some ZLS.
 

plog

Banishment Pending
Local time
Yesterday, 21:10
Joined
May 11, 2011
Messages
11,613
Why this is not showing ID2 when at least one field has null in join?

You cannot JOIN null values, full stop.

Remember in math class where they said there's a difference between 0 and NULL? 0 is a value and NULL is the absence of a value. This is an application of that non-intuitive math condition. 0 matches with 0 because they are the same value. The absence of a value cannot match with anything (an actual value like 27 or another another absence of a value), so a match can never be made on Nulls.

If you have a muItiple fields that may be null and you want to be able to match on them all together at once, you should not first UPDATE the Nulls to an out of range value, instead you should make a query and create a calculated field that creates a matching string. To match Table1 and Table2 like you want you should first make 2 queries:

qryMatch1:
SELECT ID1, Field1, Field2, Field1 & " - " & Field2 AS MatchKey1 FROM Table1

qryMatch2:
SELECT ID2, Field1, Field2, Field1 & " - " & Field2 AS MatchKey2 FROM Table2

Then you make another query based on the above 2 queries and JOIN them on just their MatchKey fields. Even if a record's Field1 and Field2 values are Null its MatchKey will have a value (at least " - ") and because of that you can JOIN on it and guarantee the matches you expect are made.
 
Last edited:

jaryszek

Registered User.
Local time
Yesterday, 19:10
Joined
Aug 25, 2016
Messages
756
o wow thank you Guys!!!! i understand!!!!

If you have a muItiple fields that may be null and you want to be able to match on them all together at once, you should not first UPDATE the Nulls to an out of range value, instead you should make a query and create a calculated field that creates a matching string. To match Table1 and Table2 like you want you should first make 2 queries:

this is very nice method. So plog you are not recommending to update table like i did just create matching keys, and the The_Doc_Man is recommending to do update to some specific values?

The difference is that using a JOIN, you MUST have an index on whichever table is the ONE side of a ONE/MANY relationship, and Access will use that fact to make a faster combination set. If you have no unique index on the ONE side of the relationship then you have to use the WHERE clause to do what is called a relation scan, which is terribly slow - and perhaps slightly more likely to introduce bloat in the FE file.

Ooo this is the tip. So while creating tables the most important thing is to create Index and if you have index you can use statement " FROM A JOIN B on A.X = B.X " instead of left join with WHERE clause? So where statememrnt and leftjoin will be much more slower then " FROM A JOIN B on A.X = B.X " with index unique fields? The_Doc_Man so if you have no index and will use this relation scan method you will update still the tables to avoid nulls?

Best Wishes,
Jacek
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
The JOIN (on keys, which means you have indexes) is almost always faster than a WHERE (without keys or indexes).

Yes, if you are creating tables AND you wish to do multiple searches on various fields, you get the most benefit from having indexes on the most common fields that will be searched. This is one of the top "performance tips" that I can give you. Searching and indexes go together like meat and potatoes. (Or in south Louisiana, "red beans and rice").

The method plog suggested might be better than changing the values. Just remember that when you are comparing things, some of which involve nulls, you are looking at cases where you have no data AT ALL. The field concatenation method suggested by plog is probably better than anything else because it doesn't involve updating that table. You do the concatenation in the query (on the fly, so to speak) so (a) no previous update means no inherent database bloat (which is caused by massive updates), and (b) the fewer modifications you make to the table, the faster you will get around to your real problem. I.e. you import the raw data and then, with those queries in place, you can begin your analysis IMMEDIATELY because the queries will do what is needed at the time you need it.
 

jaryszek

Registered User.
Local time
Yesterday, 19:10
Joined
Aug 25, 2016
Messages
756
Thank you very much!

The JOIN (on keys, which means you have indexes) is almost always faster than a WHERE (without keys or indexes).
So in my example from post #16 if i had not nulls and created indexes on field1 and field2 in both tables i would use join without WHERE clause?

And if i woud not have nulls there but also no indexes also i could use Inner join instead of left join with where clause?

Best,
Jacek
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
You have two questions there.

Case: No nulls, but had indexes - yes, you could use a JOIN. It is a requirement of Access that if you want to do JOIN, you must have an index. If you have one, you could perhaps do an index. Normally, if you have an index and can build a formal relationship, you can do a JOIN.

Case: No nulls, no indexes - no, without an index, JOINs are not allowed. whether INNER, LEFT, or RIGHT. But the WHERE clause would work with or without indexes.
 

jaryszek

Registered User.
Local time
Yesterday, 19:10
Joined
Aug 25, 2016
Messages
756
Ok,

thank you.
I added indexes to Field1 and Field2 in both tables (Table1 and Table2) and no have nulls:

This is working:

1592804153819.png


1. but when i try to use only JOIN (without INNER string) i am getting error, why?

2. When i do not have indexes on Field1 and Field2 in Table3 and Table4 i can use left join without clause WHERE:
1592804418571.png


So this is allowed? Or not ?

Please explain,
 

Attachments

  • Database9 (1).accdb
    500 KB · Views: 90

cheekybuddha

AWF VIP
Local time
Today, 02:10
Joined
Jul 21, 2014
Messages
2,237
It is a requirement of Access that if you want to do JOIN, you must have an index.
I don't think that's true.

You need to have an index to create a relationship (Access will create one for you if necessary), but you can join on unindexed fields in a query provided they are of the same datatype.

hth,

d
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
@jaryszek - Your #2 screenshot in post #27 does not have the word INNER but it DOES have the word LEFT (in lower case).

A LEFT JOIN is one of the two valid OUTER joins. An INNER join only returns records where both sides of the JOIN match up. With either flavor of OUTER join (signified by either LEFT or RIGHT but NOT the actual word OUTER), the table named on the LEFT (or RIGHT) of the JOIN is the "independent" table and you see all of its records. And when the "dependent" table has no matching record, you get nulls in those slots that would have been filled by a matching dependent record. If you compare INNER and OUTER joins and they get different results, then you have unmatched records between the two tables.

You say when you leave out INNER you get an error. But what is the error? Are we talking about an actual error pop-up? Perhaps you get the wrong results? Please explain what you expected and what you actually got.

@cheekybuddha - I was taught that you needed the index for a JOIN, but if things have changed since that teaching occurred, I really cannot say that I am particularly surprised. We ARE talking about third-party instruction from many years ago.
 

jaryszek

Registered User.
Local time
Yesterday, 19:10
Joined
Aug 25, 2016
Messages
756
thank you The_Doc_Man.

ok it is confusing a lot for me.

You can join on not indexes fields. and you do not have to use where clause then.
What is the difference then?

and error is:
1592991536550.png


so JOIN is not working but if i understand this correctly:
1. using indexes can fasten using all joins
2. JOIN will not work in access, only inner join but they are different
3. Where clause can make slower your queries but i do not catching how

Please help,
Jacek
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
OK, it is confusing. Here is my shot at explaining things taking into account that cheekybuddha says you don't need an index to run a JOIN.

First, indexing vs. no indexing -

When a database has data that will be searched for any purpose, what Access does is it makes an internal list of the records matching the data to be presented, updated, deleted, or whatever the query was meant to do. It does the "location" part of the operation BEFORE it does any other action. This concept is incredibly important. Access finds records before acting on them.

The clunky way to do this is to read every record one at a time to see if what you wanted was there. This is called a relation scan (or a linear scan) and potentially takes a very long time. It takes a long time because typically the raw record being searched is fairly long, and Access reads things one disk buffer at a time, I believe in 4 KB gulps. The longer your record, the fewer records fit into the single buffer in use. This is because typically your system doesn't bring the entire back-end into memory at once. So it has to wade through the records one buffer-load at a time to build the list. It doesn't matter how much RAM you have because the operating system does its I/O buffer-oriented, just like every O/S I have ever seen since 1968. So like reading pages in a book, you read them one page at a time.

When you put an index on a field that you will search frequently, Access makes a hidden second table that lists two things - the field(s) in the index and the record number/location of the whole record containing that value. I said "field(s)" since there is such a thing as a single index with multiple fields. There are limits on the number of characters that can be indexed - 255, I believe - and you have a limited number of indexes on a single table - 32 if I remember that correctly. But here is where the advantage comes in. Index records (which you can never directly see) are VERY SHORT compared to the whole record. Therefore, when Access is running a search on a particular field, it can pull in more index records into that disk buffer in a single disk read operation. I.e. bigger data gulps but specific to the field being searched. Back to the analogy of reading pages in a book, if the book has a topics index, you can look through the index first to see what pages to read. Unlike Access, you can SEE a book's index.

Therefore, where indexing shines is that you take less time to find the records (pages) of interest, the places where the action of the query can then focus. For short tables, this phase is fast. For longer tables, this phase is slow. Whether it is a linear scan or an index scan, it depends on the size of the table, and that is pretty much all you need to know about index vs. linear scans.

I'm going to break here because a thunderstorm is overhead and I don't want a bunch of typing to get lost if I lose power. So this is the installment on the value of indexing. I'll do JOIN vs WHERE next.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
OK, next is JOIN vs WHERE

The difference between JOINs and using a WHERE clause is trickier in one way, but it is again related to finding records to be acted upon. BOTH methods ultimately relate to finding the correct records within the set of all records in your data sources.

One trivial difference is that a WHERE clause can be used in almost any case you can name including a single-table search. A JOIN is always used when you are looking at two sources of records (or more). So let's not worry about the single-table case, it leads to a boringly quick discussion.

I'll focus on the two-source case for simplicity. It can be two different tables or it can be two views of the same table (informally called a self-JOIN because the ANSI SQL standard doesn't call out anything special about that case.) You have to also remember this: Access and every other SQL engine in the world that follows the ANSI standard is based on set theory. Set theory is all about combinations and permutations.

Access cannot proceed with any other action until it makes a list of all of the records to be affected by that action, even including the simple case of building a datasheet for display purposes. This becomes the set of records of interest. The syntax that identifies tables is used to limit the scope of the set of interest. The more tables (or queries) you name as sources, the bigger the scope. Then you get to syntax related to reducing the size of that set so that it won't take forever.

In the JOIN case, you have SELECT bunch-of-fields FROM A INNER JOIN B on A.field = B.other-field WHERE.... and the idea is that you are taking two sets of records to create a single virtual record formed by combining those two sources. The JOIN clause tells Access how to quickly put together the two souces - in this case by finding equality between specific fields in the two different sources. The JOIN syntax tells Access to consider the joined virtual record as a unit for the purposes of what you are trying to do. It alleviates the need for you to write code that does OpenRecordset on the two sources and to make a programmatic double-loop scan to find matching records.

But there is another way to do this same thing. You can write SELECT bunch-of-fields FROM A, B WHERE ( A.field = B.other-field ) AND ... with the expressed purpose that again you will eventually get the same exact set of records as the JOIN. Nobody said that there was only one way to do a given job in Access. Sometimes there IS only one way. This isn't one of those times. But there IS a danger here. If you chose to use the WHERE method and forgot the (A.field = B.other-field) portion, you invite something called a Cartesian JOIN. This nomenclature reveals that in fact the two methods (JOIN vs. WHERE) are related.

A Cartesian JOIN (also called a permutation JOIN) occurs because this is done via SET theory. (Remember that comment from above?) The way that the syntax works, that FROM A,B clause says take EVERY record from set A and combine each record in A with EVERY record from set B. So you get records A1B1, A1B2, A1B3, A1B4, ... A2B1, A2B2, A2B3, A2B4, ... AnB1, AnB2, ... AnBn... EVERY COMBINATION in the sets of records A and B. Then the WHERE clause gets to filter out non-matching cases, which is what WHERE clauses do.

The power of the JOIN is that it allows you to tell Access how to more quickly reduce the set of records. And this is important because it turns out that order of operation matters - a LOT. This article might help.


In it, you learn that SQL performs gathering actions first, and that specifically includes JOINs as a way of determining the records in question. The filtration implied by a WHERE clause comes SECOND in the list of internal SQL actions. Indeed, you CAN have both JOIN and WHERE clauses in a query. They just aren't implemented at the same time.

The WHERE clause in a JOIN (because you can have both elements) has less data to consider because the JOIN syntax allowed earlier elimination of record combinations that didn't match the JOIN criteria. And the key there is that if you have fewer records on which to work, everything is faster.

Now, getting back to the other part of the original question, if you have indexed the fields on which the JOIN is to occur, Access has the ability to make its JOIN evaluation based on index scans rather than linear table scans. Which is where indexes on fields to be JOINed become so valuable. We can easily say that either syntax (JOIN vs. WHERE) will eventually reduce the data set to whatever was intended. But how long will that take? The answer depends on (a) having indexes on the fields being used for selection criteria and (b) having JOINs in play to more rapidly reduce the size of the set of records to be considered for subsequent operations.

If you look at the linked article, you realize we still haven't touched on things like GROUP BY clauses, HAVING clauses, ORDER BY clauses, and the DISTINCT keyword.

In summary, what good is an indexed JOIN? It gets you more quickly to the part where your query actually starts working on the correct list of data.
 
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
Now, you also commented in post #30 on INNER and OUTER joins and said you didn't think they worked. Different SQL engines use different syntax for this but I assure you that both kinds of JOINs work; you just have to know what you are asking them to do.

Remember in my post on JOIN vs. WHERE, I mentioned the importance of set theory. The INNER and OUTER joins look at sets differently. First, let's talk about the INNER JOIN. An INNER JOIN says "find all records from two sources where they have a particular VALUE in common between two fields, one from each source. Don't show men any records if the values don't match." In set theory, the set is the INTERSECTION of two recordsets.

This is the typical method of pulling parent/child records together. For instance, take a doctor's office case of a patient and visits. The syntax might be something like SELECT ... FROM patient INNER JOIN visits ON patient.PatID = visit.PatID... GROUP BY patient.PatID , and what you would get is records with patient information and visit information, gathered together according to the patient's internal ID number. This is also how you do inventories in a transaction-based setup. It is a work-horse way of merging data from two tables.

There is also the OUTER JOIN concept, but that depends on a very important concept. In an OUTER JOIN, one of the two tables is considered so important that you want to see every record even if there is no matching record in the other participating table. An INNER JOIN won't show you that because it requires matches from both tables. Access uses the syntax of LEFT JOIN or RIGHT JOIN to indicate which table is more important. You will also see these tables described as (rarely) primary and subordinate or (more commonly) independent and dependent. Let's look at the syntax.

You have SELECT .... FROM A LEFT JOIN B ON A.x = B.x or SELECT .... FROM A RIGHT JOIN B on A.x = B.x (but you should note that Access does not implement the third type of OUTER join called a FULL OUTER JOIN.) In the two cases above, the only difference is LEFT vs. RIGHT.

In the LEFT JOIN case you get all records from A and any records from B that match. In the RIGHT JOIN case you get all records from B and any records from A that match. Note that in some SQL implementations, the word JOIN in isolation is taken as INNER JOIN. However, Access SQL requires that you explicitly specify which of the three implemented JOINs you meant. So JOIN (by itself) doesn't work because you haven't qualified what KIND of JOIN you are requesting. And the set theory definition of LEFT or RIGHT JOIN is "the set of all records of the independent table, some of which might be matched from the dependent table." And here is where combinatorial concepts sneak in. If the dependent table has more than one matching record for a record in the independent table, you get TWO records, one for each case where a dependent record matched the same independent table's record.

What is the major effective difference between INNER and OUTER joins? Answer in brief: Nulls. The INNER JOIN will much less often return something as null and such returns are likely to be isolated cases. An OUTER JOIN will return nulls in many fields for every record in the independent table that is not matched by a record in the dependent table. From our doctor's office example, if you used a WHERE clause to show visits in a given year and one particular patient didn't visit that year, the INNER JOIN would omit the patient entirely. The OUTER JOIN with "patient" as the independent table would list the patient in a single record with nulls for any fields derived from the "visits" table.

The pitfall here is that those nulls are more difficult to use because Access makes a very clear and invariant rule: Nulls cannot be compared to nulls in order to find equality. Even in records where the meaning of that null is identical, null is always NOT EQUAL to another null. I said "where the meaning is identical" but let's be clear that the meaning of that null is the Access meaning - NO DATA AVAILABLE. You can make NO MATCHES between two null values because nulls never equal anything including themselves. PERIOD. FULL STOP. END OF SENTENCE.

People often try to use nulls in some tricky way as an extra "value" but the problem is that it cannot be compared as such. It can only be detected as having a value or having no value (i.e. null). That is why you have functions like ISNULL(x) or NZ(x,y) - to intercept nulls. That is why SQL has special syntax to detect nulls - to PREVENT you from attempting to use nulls for comparisons. And that is why LEFT JOIN and RIGHT JOIN will more often cause you headaches.
 

isladogs

MVP / VIP
Local time
Today, 02:10
Joined
Jan 14, 2017
Messages
18,186
@The_Doc_Man
In fact, from Access version 1.0 onwards, you have NEVER needed to have indexes on join fields in a query.
I was fairly certain of this but rechecked using my copy of Access 1.0 on an VM.

1593017962606.png


Of course, indexing will vastly improve performance as already stated but it has never been a requirement.

Incidentally, the screenshot shows a useful feature that existed in Access 1.0 but is no longer available.
Can others see what I am referring to here?
 

isladogs

MVP / VIP
Local time
Today, 02:10
Joined
Jan 14, 2017
Messages
18,186
Whilst I was referring to a SELECT query, for UPDATE and DELETE queries involving two or more joined tables, the issue is to ensure unique records:
i.e. use UPDATE DISTINCTROW or DELETE DISTINCTROW.
Indexing is still not required though can at times be beneficial.
Having said that ,whilst indexes make select queries far faster, they actually significantly slow down action queries as the index itself needs to be updated.
See http://www.mendipdatasystems.co.uk/speed-comparison-tests-7/4594524997
 
Last edited:

Pat Hartman

Super Moderator
Staff member
Local time
Yesterday, 22:10
Joined
Feb 19, 2002
Messages
42,981
You can join on any fields of like data type. That is the only requirement. So joining Addr1 in tblA to CompanyName in tbl2 is perfectly valid as long as the data types are the same. SQL Server is a little more picky and it also requires the lengths to be the same. It makes no sense but you can do it

However, if you create relationships, Access creates hidden indexes on the FK fields.
 

Isaac

Lifelong Learner
Local time
Yesterday, 19:10
Joined
Mar 14, 2017
Messages
8,738
SQL Server is a little more picky and it also requires the lengths to be the same
Sorry, but that's incorrect. Different lengths of text type fields can be joined and, even different datatypes, as long as they can be implicitly converted.
 

jaryszek

Registered User.
Local time
Yesterday, 19:10
Joined
Aug 25, 2016
Messages
756
Thank you very much for very nice explanations, especially thanks for The_Doc_Man!
I will sit on the weekend and learn from you more!
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 21:10
Joined
Feb 28, 2001
Messages
27,001
@isladogs - as I alluded to, my training on that topic was years ago and and I can't say we spent a lot of time on it. So if you don't really need the index, fine. Regarding the "useful feature" ... Were you referring to the now-obsolete user access control option in that query? Or was it the odd syntax for that join that in modern syntax appeared to do both a Cartesian and a more normally regulated JOIN of the tables?

And I have to admit that my first version of Access was not 1.0, but 2.0, because my earlier DB work was based on Borland Paradox for DOS. I was a "convert" because Borland Paradox for Windows screwed the pooch with the way they did things.
 

Users who are viewing this thread

Top Bottom