More than 15 left joins for matching rows (1 Viewer)

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 02:47
Joined
Feb 28, 2001
Messages
27,186
Always glad to help.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
I think I have a very easy way to do this. It will work with as many fields as you want to compare, and compares nulls. It is very easy to set up.

1) Take your second tbl and build a query. For each field you want to compare give it the name of the field in the first table preceeded by an underscore. The purpose of the underscore is in case field names in both tables are the same. However the names could be different.
Code:
SELECT tbltwodemo.uniqueid,
       tbltwodemo.userid,
       tbltwodemo.firstname,
       tbltwodemo.lastname,
       tbltwodemo.[full name],
       tbltwodemo.email,
       tbltwodemo.phonenumber,
       tbltwodemo.street,
       tbltwodemo.city,
       tbltwodemo.state,
       tbltwodemo.country,
       tbltwodemo.zipcode,
       tbltwodemo.domain,
       tbltwodemo.username,
       tbltwodemo.ip_address,
       tbltwodemo.[created at],
       tbltwodemo.parkingspot
FROM   tbltwodemo
WHERE  ( ( ( tbltwodemo.userid ) = [_userid] )
         AND ( ( tbltwodemo.firstname ) = [_firstname] )
         AND ( ( tbltwodemo.lastname ) = [_lastname] )
         AND ( ( tbltwodemo.[full name] ) = [_full name] )
         AND ( ( tbltwodemo.email ) = [_email] )
         AND ( ( tbltwodemo.phonenumber ) = [_phonenumber] )
         AND ( ( tbltwodemo.street ) = [_street] )
         AND ( ( tbltwodemo.city ) = [_city] )
         AND ( ( tbltwodemo.state ) = [_state] )
         AND ( ( tbltwodemo.country ) = [_country] )
         AND ( ( tbltwodemo.zipcode ) = [_zipcode] )
         AND ( ( tbltwodemo.domain ) = [_domain] )
         AND ( ( tbltwodemo.username ) = [_username] )
         AND ( ( tbltwodemo.ip_address ) = [_ip_address] )
         AND ( ( tbltwodemo.[created at] ) = [_created at] )
         AND ( ( tbltwodemo.parkingspot ) = [_parkingspot] ) );
2) Loop the records in the first table

Code:
Public Sub MatchFields()
  Dim rsSearch As DAO.Recordset
  Dim strSql As String
  Dim db As DAO.Database
  Set db = CurrentDb
  
  Set rsSearch = db.OpenRecordset("Select * from tblOneDemo")
  Do While Not rsSearch.EOF
    IsMatch rsSearch ' this is a function so you can return the value if it is a match
    rsSearch.MoveNext
    'Exit Do
  Loop
End Sub

3) It reads the current record and uses that to fill in the values of the parameters. Since the paramter name is the name of the field (with underscore) it knows which value from the current record to fill into the parameter.
4) The big trick is that if the value being searched for is null in the first table it modifies the qdf query string
from "lastName = [_lastName]" to "lastName is NULL"
5) Then it resets the query def back to original

Code:
Public Function IsMatch(rsSearch As DAO.Recordset) As Boolean
  Dim rsFind As DAO.Recordset
  Dim strSql As String
  Dim qdf As QueryDef
  Dim db As DAO.Database
  Dim prms As Parameters
  Dim prm As DAO.Parameter
  Dim fld As DAO.Field
  Dim prmField As String
  Dim newSql As String
  Dim OldSql As String
  Set db = CurrentDb
  Set qdf = db.QueryDefs("qryIsMatch")
  Set prms = qdf.Parameters
  
  newSql = qdf.SQL
  OldSql = qdf.SQL
  'replace the null parameters with Is null
  For Each prm In prms
    'Unfortunately need to run this twice
    'all values lost after resetting qdf.sql
    prmField = Replace(prm.Name, "[", "")
    prmField = Replace(prmField, "]", "")
    prmField = Mid(prmField, 2)
    prm.Value = rsSearch.Fields(prmField).Value
    'change the sql string
    If IsNull(prm.Value) Then
      newSql = Replace(newSql, "=" & prm.Name, "is Null")
    End If
  Next prm
  qdf.SQL = newSql
  For Each prm In prms
    prmField = Replace(prm.Name, "[", "")
    prmField = Replace(prmField, "]", "")
    prmField = Mid(prmField, 2)
    prm.Value = rsSearch.Fields(prmField).Value
  Next prm
  Set db = CurrentDb
  'Debug.Print qdf.SQL
  'For Each prm In prms
  '  Debug.Print prm.Name & prm.Value
  'Next prm
  Set rsFind = qdf.OpenRecordset
    If Not rsFind.EOF And Not rsFind.BOF Then
      rsFind.MoveLast
      InsertMatch rsSearch!uniqueID, rsFind!uniqueID
    End If
  qdf.SQL = OldSql
End Function

So I autonumbered both tables to give it a unique value. Then I wrote the matches to the junction table. When done joined the two tables to show matches, and could be used to delete records.

Code:
Public Sub InsertMatch(table1ID As Long, table2ID As Long)
  Dim strSql As String
  strSql = "Insert into tblMatches (IDTableOne, IDTableTwo) values (" & table1ID & ", " & table2ID & ")"
  CurrentDb.Execute strSql
End Sub

Demo attached. Actually works better than I thought.
 

Attachments

  • MatchTables.accdb
    988 KB · Views: 133
Last edited:

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
If the data is not proprietary could you post some? I would like to run this code against it.
 

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
O wow!

Awesome! Thank you!
I can not share with my data but i will figure out data for you tomorrow.
I do not understand exactly how your solution is working so i will have also few questions but if i would provide example for you - maybe the questions will be not necessary.

Best,
Jacek
 

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
a sorry you provided data, wow, i have to review this, sorry i can not manage this today, i will let you know MajP

Best,
Jacek
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
If you have questions need to read the following first
To understand how this works you should understand the following.

https://docs.microsoft.com/en-us/of...sktop-database-reference/parameter-object-dao
https://docs.microsoft.com/en-us/of...-database-reference/parameters-collection-dao

So a query definition with Parameters has a collection of parmeters. Parmeters can be passed values. You can set these values before opening the query. So I read the first record in table 1. Then I get the values from each field in that record. I pass the value to the correct parameter. I know the correct parameter since its name matches the field name where it is coming from in table 1. So before opening the query all the parmeters are set. For the first record, these are the parmeters and their values prior to opening the a recordset based on the query definition.

Code:
[_UserID]     0
[_LastName]   Emmerich
[_Full Name]  Keon Wiegand
[_Email]      [email]Merritt@emily.us[/email]
[_PhoneNumber](568)232-8693 x581
[_Street]     655 Herman Radial
[_City]       East Thurman
[_State]      Montana
[_Country]    Fiji
[_zipCode]    14539-1448
[_Domain]     annie.me
[_UserName]   Emie_King
[_IP_Address] 203.209.94.235
[_Created At] 4/3/2019 6:25:14 PM
[_ParkingSpot]986
All these values fill in the parmeters. To demo I filled in 2 and 3 to show what is passed
Code:
AND ( ( tbltwodemo.firstname ) = [_firstname] )
         AND ( ( tbltwodemo.lastname ) = "Emmerich" )
         AND ( ( tbltwodemo.[full name] ) = "Keon Wiegan" )
         AND ( ( tbltwodemo.email ) = [_email] )
         AND ( ( tbltwodemo.phonenumber ) = [_phonenumber] )
         AND ( ( tbltwodemo.street ) = [_street] )
         AND ( ( tbltwodemo.city ) = [_city] )
         AND ( ( tbltwodemo.state ) = [_state] )
         AND ( ( tbltwodemo.country ) = [_country] )
         AND ( ( tbltwodemo.zipcode ) = [_zipcode] )
         AND ( ( tbltwodemo.domain ) = [_domain] )
         AND ( ( tbltwodemo.username ) = [_username] )
         AND ( ( tbltwodemo.ip_address ) = [_ip_address] )
         AND ( ( tbltwodemo.[created at] ) = [_created at] )
         AND ( ( tbltwodemo.parkingspot ) = [_parkingspot]

As I said the trick is if there is a null in the record you have to modify the where clause the where. Assume first name and last name are null for the record I am looking for. Then the sql gets modified to

Code:
AND ( ( tbltwodemo.firstname ) is Null )
         AND ( ( tbltwodemo.lastname ) is Null )
         AND ( ( tbltwodemo.[full name] ) = [_Full Name] )
         AND ( ( tbltwodemo.email ) = [_email] )
The other trick is that you cannot set the values of the parameters and then modify the Sql. You have to modify the SQL first and then set the parameters. So this causes you to have to loop the fields twice. First time to determine which are null and then update the SQL string. Then read them again and set the values of the parameters.
Once you are done you have to reset the sql for the query definition since you have modified it.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
MajP
Have tried this using your example data & its very impressive.

Its also much faster than I would have expected,
Will try with a larger dataset of my own later & if it helps will upload that for your use.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
I was surprised it worked so well. I had reservations. However, until you run it on real data you never know what you forget to think about.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
Here as promised is an updated version of MajP's matching records example database.
The main code is exactly as MajP provided.
I've added some peripheral items including a timer and progress bar to guide users when checking large datasets

The attached database has 1483 fictitious student records and 22 fields to be checked
I had made changes to 16 records in one or both tables including 3 with nulls in the same field in both tables

It took around 110 seconds to check all those records (or just under 100s with no progress bar) & identify matches / non-matches
I think that is VERY IMPRESSIVE
I'm sure this code will be extremely useful to other developers Thanks again MajP.



Just one important issue however.
The 3 records with nulls in the same field in both tables were marked as MATCHED



Its late here and I haven't looked into the reasons for this as yet.
 

Attachments

  • MatchTables_v2_CR.zip
    298.7 KB · Views: 125
  • TestResults.PNG
    TestResults.PNG
    48.6 KB · Views: 335
  • NonMatchingRecords.PNG
    NonMatchingRecords.PNG
    50.2 KB · Views: 363

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
Just one important issue however.
The 3 records with nulls in the same field in both tables were marked as MATCHED
Can you point out which records? The main point of this code was to try to find records with the equivalent null records. If the code it is failing do that then not much has been gained from other approaches. I thought from my test it was matching like nulls.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
Can you point out which records? The main point of this code was to try to find records with the equivalent null records. If the code it is failing do that then not much has been gained from other approaches. I thought from my test it was matching like nulls.

Apologies. I was responding late at night and with hindsight not thinking clearly.:banghead:
I've just scrapped a couple of earlier responses as I've belatedly understood the idea.

You are deliberately identifying two records as matched where these are identical except for fields with matching nulls.
That does make sense.

I've got one further test I want to make but I'll wait until my brain is in gear.
 

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
Hi MajP,

awesome, i understood this!
Great approach love you way!

Best wishes and thank you once again, you are smart!

Best,
Jacek
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
@MajP

I've done some more work on this. Hope that's OK
I've added a 2nd subform to show unmatched records in the 2nd (IMPORT) table

Now that I'm properly awake, I've run some more tests - all passed with FLYING COLOURS!:

  • Null vs Empty String - MATCH
  • Null vs String with a single space - MATCH
  • Leading space in field name value - NO MATCH
  • Single number datatype in one table and decimal in other with scale/precision so values looked identical -
    MATCH where actual values identical ; NO MATCH where actual values stored are different
  • Extra record in 2nd (IMPORT) table - displayed in 2nd subform

Also tested on another table with over 3100 records & 16 fields - total time approx 190s - This is MUCH FASTER than any other method I've ever tried on data with no matching PK field and with proper handling of null values (now I understand its intention - sorry once again for my previous misunderstanding)

I'm now thinking about logging fields with differences as the test progresses & displaying those for information
It will obviously make the test slower - possibly much slower.
Not yet looked into the best way of doing this and until I do, I'm not sure how much work this will be.
Have you already looked into doing this.

BTW - so it can be found easily by others in the future, I really think you should post your code to the repository or to sample databases
 

Attachments

  • TestResults-Students.PNG
    TestResults-Students.PNG
    69.4 KB · Views: 112
  • TestResults-PostcodeDistricts.PNG
    TestResults-PostcodeDistricts.PNG
    61.7 KB · Views: 111
Last edited:

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
Hi Colin,

thank you for testing.
What is interesting for me is that code in VBA is faster than joins.
Looping one record by record is faster than joins, why is that?

Best,
Jacek
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
What is interesting for me is that code in VBA is faster than joins.
Looping one record by record is faster than joins, why is that?

It isn't!

Looping through a recordset is always much slower than using queries to do the same thing. It took around 100 seconds with my students data to identify matching and non matching records. That info would then allow me to run append, update and delete queries on selected records.
However in reality, those tables both contained a unique admission number field (Adno) which was originally the PK. So normally I would have run all those 3 queries using Adno as the link field. Total time for 1500 records? Probably about a second at most.

Whilst I am very impressed by this solution, in the real world, I would only use it where no other solution were possible.

Over ten years ago, I remember having a situation where it would have been perfect. A government agency supplied student data which was very useful in setting target grades.

However, the agency omitted the Adno and UPN fields both of which were unique and used their own random proprietary ID values. They refused to use nationally recognised data despite our pleas.

In the absence of a usable PK, we had to match the data using a combination of fields, first/middle/last names (not always consistent), DOB, gender and several others. Comparisons were done field by field, usually in Excel, and it took hours to do with mistakes very easy to make.

This would have been a considerably better solution for that situatioj
 

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
Hi Colin,

thank you for explanation!

Have a nice day,
Best,
Jacek
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,221
I'm now thinking about logging fields with differences as the test progresses & displaying those for information ....
Have you already looked into doing this.

@MajP
Following on from my previous comments, I've now added code to identify fields in both tables with matching nulls
However its obviously not possible to log fields with data changes using this approach as these result in no match!

Anyway, I've attached my latest version - this time using your dataset.



I'd be interested in hearing your views on this.

P.S. I'm just about to send you a PM on this topic

UPDATE 14/02/2019:
The attached file has a couple of errors - please use the version in post #61 instead
 

Attachments

  • MatchRecords_v3.PNG
    MatchRecords_v3.PNG
    59.9 KB · Views: 269
  • MatchTables_v3C_MajP.zip
    182.8 KB · Views: 126
Last edited:

MajP

You've got your good things, and you've got mine.
Local time
Today, 03:47
Joined
May 21, 2018
Messages
8,529
The demo has a linked table in there that needs to be made local. It is the tblStudents.
 

jaryszek

Registered User.
Local time
Today, 00:47
Joined
Aug 25, 2016
Messages
756
Hi Colin,

If strNullMatch = "" Then
strNullMatch = prmField
Else
strNullMatch = strNullMatch & "; " & prmField
End If

why did you implement this? And ow this is working?
Nulls already have been found by MajP code?

Best,
Jacek
 

Users who are viewing this thread

Top Bottom