Find duplicates based in a few characters

mkelly

Registered User.
Local time
Today, 11:18
Joined
Apr 10, 2002
Messages
213
I have two tables I deleted the duplicates from each then did a union query based on name.
However the two tables named things different like vermont college vs vermont college inc.
Can I do a find duplicates based on sat the first 5 characters?
If so how.

Thanks in advance
 
This is a difficult problem. There are a couple of tools that can help but there are few perfect solutions.

I personally use a Jaro-Winkler algorithm for exactly your problem. It doesn't give a black/white answer...more shades of grey. It takes a human to evaluate the answer but it definitely culls the results down to a managable level.

You can also look at Soundex(). Same problem, you won't get an exact answer and will have a lot of mis-matches.

You can also compare using left(), which is in the spirit of your proposed solution. It can still give you an inaccurate reading, however.
 
Where would I use the left() command??

SELECT First([union duplicates].name) AS [name Field], Count([union duplicates].name) AS NumberOfDups
FROM [union duplicates]
GROUP BY [union duplicates].name
HAVING (((Count([union duplicates].name))>1));

Thanks
 
You're not going to be able to do it that way. How would SQL know which of the partial matches to use as [union duplicates].name?
 
If I don't have Jaro or soundex is there any way to do it?
 
I can provide you with both (just PM me with your personal e-mail address). You'll have to use a different approach, though. They don't work with aggregate functions. Same with left(). This is an insanely difficult problem to overcome using a single standard query.
 

Users who are viewing this thread

Back
Top Bottom