Query to find a match when words in a different order

AndrewS

Registered User.
Local time
Today, 04:22
Joined
Feb 21, 2017
Messages
30
I'm trying to implement duplicate checking for importing data from an external system: basically querying the imported data against existing data using Allen Browne's Soundex and (well, or) simple like, but there's one thing that avoids detection by either: if the words are there but in a different order.

E.g. London University and University of London
E.g. Ambridge Organics and Organics of Ambridge

Anyone got any ideas how to set criteria to match London University with University of London in a query?

The one thing that's occurred to me (as I started typing this post!) is to spit the data into separate words (ignoring ofs, thes, ands), and then build the query, so I'd end up with something like (pseudo code)
Code:
SELECT tbl1.field1, tbl1.field2, tbl2.field1, tbl2.field2
FROM tbl1, tbl2
WHERE soundex(tbl1.field2) = soundex(tble2.field2) 
OR tbl1.field2 LIKE tbl2.field2
OR (tbl1.field2 LIKE *word1*
      AND
      tbl1.field2 LIKE *word2*
      AND
      tbl1.field2 LIKE *word3*

I've not tried that yet, but I'll still post this now, in case anyone has a more elegant solution.

Thanks!
 
Yes, your approach(while typing) is akin to general concept.
You will see articles involving keywords, synonyms, stems and noise words.

If words are in a different order, you can capture the "keywords and sort them".
Noise words are the "and, of, the, Ltd, in, for,..etc..". These are very common terms.

You may get more focused responses/links if you describe your situation with volumes, final usage(s) .
Good luck with your project.
 
Thanks both,

I'll progress down this route.

Pat, Ha! Yes, you're right of course. This is only a relatively small system though - essentially a basic CRM system, to allow tracking of interaction with businesses and their employees. This particular process I'm working on is for importing attendance lists on workshops from Eventbrite - typically no more than 20 per event, and will display possible matches for the user to identify as matches or not.
 

Users who are viewing this thread

Back
Top Bottom