Fuzzy Match to compare two tables

ryan.gillies

Registered User.
Local time
Tomorrow, 06:14
Joined
Apr 8, 2011
Messages
53
Hi all

I'm attempting Fuzzy Matching for the first time and have two tables of data to match - a master file of company names (tMaster) and a list of user-inputted company names (tUser) from a survey. As you can imagine the user-inputted table has suffered greatly in the data integrity department.

I want to try and match as many of the fields from tUser against tMaster as possible. Some of the worst case examples of the data I'm facing are as follows:

  • Investor Compensation (Claims) LLP match to
    • Inveator Compensation
    • Investor Compensation Claims LLP
    • Investor Compensation LLP
  • i-Sm@rt Consumer Services Ltd match to
    • I-Sm@art Consumer Solutions
    • Isamrt Consumer Solutions
    • I-Sm@rt Consumer Services
I've stumbled across Allen Browne's Soundex function, and I'm wondering if this is something I can use in this kind of a scenario. I've fiddled around with it, but I'm not sure I'm using it correctly.

Currently I have both tables in an unjoined query, in which I set the criteria for soundex([user name]) to soundex([master name]). It seems to be working reasonably well and has matched the above examples I gave. However it has had some unwanted results, such as matching PPI-Claimback.co.uk to PPI Claimline, and Kevin McGee to Key Financial Claims Ltd.

Has anyone used Soundex successfully on something like this before? Any tips on how to go about fine tuning my results?

Thanks in advance!
 
This looks like successful fuzzy matching to me:
However it has had some unwanted results, such as matching PPI-Claimback.co.uk to PPI Claimline, and Kevin McGee to Key Financial Claims Ltd.

Because soundex is somewhat costly to calculate, it's common to store the soundex code with your data. Then, the algorithm to compare soundex codes is considerably more efficient, but it's fuzzy!!! You didn't want exact matches, and you didn't get them. Sounds like a success to me.
 

Users who are viewing this thread

Back
Top Bottom