I'm trying to implement duplicate checking for importing data from an external system: basically querying the imported data against existing data using Allen Browne's Soundex and (well, or) simple like, but there's one thing that avoids detection by either: if the words are there but in a different order.
E.g. London University and University of London
E.g. Ambridge Organics and Organics of Ambridge
Anyone got any ideas how to set criteria to match London University with University of London in a query?
The one thing that's occurred to me (as I started typing this post!) is to spit the data into separate words (ignoring ofs, thes, ands), and then build the query, so I'd end up with something like (pseudo code)
I've not tried that yet, but I'll still post this now, in case anyone has a more elegant solution.
Thanks!
E.g. London University and University of London
E.g. Ambridge Organics and Organics of Ambridge
Anyone got any ideas how to set criteria to match London University with University of London in a query?
The one thing that's occurred to me (as I started typing this post!) is to spit the data into separate words (ignoring ofs, thes, ands), and then build the query, so I'd end up with something like (pseudo code)
Code:
SELECT tbl1.field1, tbl1.field2, tbl2.field1, tbl2.field2
FROM tbl1, tbl2
WHERE soundex(tbl1.field2) = soundex(tble2.field2)
OR tbl1.field2 LIKE tbl2.field2
OR (tbl1.field2 LIKE *word1*
AND
tbl1.field2 LIKE *word2*
AND
tbl1.field2 LIKE *word3*
I've not tried that yet, but I'll still post this now, in case anyone has a more elegant solution.
Thanks!