finding similar records across two tables (1 Viewer)

Cowboy_BeBa

Registered User.
Local time
Tomorrow, 03:55
Joined
Nov 30, 2010
Messages
188
trying to fix up a massive db (over 40k records) that is a real mess (duplicates, blank records, mis-spelled stuff, you name it and im dealing with it)

right now trying to update data ive got from one list (which ive imported as a table) into the main list
there is a primary key in the main list but no matching key in the new list, so im joining them based on two fields, FirstName and LastName. Each table has em and ive managed to find quite a few matches this way (created a new DBID field and have stored the ID's in that). However there are quite a few records that dont match (created a query to identify em), this is because either the name isnt in the DB at all but is on the list (which is a possibility but i only expect a small number of records to fit into this category) or because a lot of names have been mis spelled on the second list (ie Apelbaum spelled as Applebaum or Mr. Sam in one table and simply Sam in another)

Is there any way i can query the DB to join the tables on records with similar names?

(Edit: also no need to point out that First Name and Last Name probably have a few duplicates, ie several people named John Smith, im aware and am filtering those out manually when they come up, its just the best i can do with the data ive been given atm)
 
Last edited:

sneuberg

AWF VIP
Local time
Today, 12:55
Joined
Oct 17, 2014
Messages
3,506
You can join two tables based on a function. In the attached database I join two tables with a function I created name Match. The query looks like
Code:
SELECT Table1.ID, Table1.F1, Table2.ID, Table2.F1
FROM Table1 INNER JOIN Table2 ON Match( Table1.F1,Table2.F1);

Note that because of the function in the join this can only be viewed in the SQL view. And the Match function is:


Code:
Public Function Match(f1 As Variant, f2 As Variant) As Boolean

Match = f1 = f2

End Function

Which seems pretty stupid as this just implements a simple natural join but the point is that you could make that function do anything you want. The question is how to define if two things are similar enough. I suggest trying the Levenshtein distance. This algorithm has been implemented in VBA and I had it on my computer but can't seem to find it right now. With enough Googling you should be able to find it on the Internet.

Depending on how long the strings are this algorithm can gobble up a lot computer cycles so I suggest testing this on a small number of records first to see if it feasible to do it this way.
 

Attachments

  • JoinFunction.accdb
    492 KB · Views: 143

Cowboy_BeBa

Registered User.
Local time
Tomorrow, 03:55
Joined
Nov 30, 2010
Messages
188
DUDE!!!!!

That is some next level shit right there
thank you so much for your help!!! it is very much appreciated
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 05:55
Joined
Jan 20, 2009
Messages
12,852
See the function in post 10 of this thread for Damerau-Levenshtein Distance function implemented in VBA.

The return is a number that is lower for closer results. It is optimised for detecting typographical errors.
 

Users who are viewing this thread

Top Bottom