Duplicate winnowing

marthacasting

Registered User.
Local time
Yesterday, 23:35
Joined
Oct 17, 2011
Messages
67
I have about 20K names in my DB and I want to search for duplicates.
If I use first name and last name to search for duplicates, it does not pick up if the first name is entered once as Pat and a second time as Patricia.
Is there something I can enter in the first name field to recognize the first two letters of the name as being the same--without having to list every letter in the alphabet?

Thank you!
 
I would make matching keys. Essentially you create a query and then construct a key using parts of every data piece you want to compare in your definition of 'duplicate':

TableKey1: Mid([FirstName], 1, 4) & "~" & Mid([LastName], 1, 4)

The above makes a key using the first 4 characters of the first and last name, seperated by a ~. I would do that for both tables, then make a 3rd query, joining those 2 queries by their TableKey1 values. Whatever the query returns is a possible duplicate.

Usually my keys are more than just names--I use zip, state, address etc to make multiple keys. Then I match my data in multiple ways to find all the possibilities.
 
Thank you so much for your prompt response!
I am sorry to say that this is over my head though!
With appreciation . . .
 
If that is over your head, then maybe you can explain how you are doing this search...
... I use first name and last name to search for duplicates ...
There are a million ways to search. For best customization of our responses to your process, describe your process in greater detail.
hth
Mark
 
I simply use the Query Wizard >Duplicates Query function and choose state, first name, last name as the fields that might have duplicate information. (And then I bring over other fields for further information.)
 
You're going to have to edit the SQL that the query wizard writes, because it does not appear that there is an option to perform a function on a field value, and then feed that into the wizard, which is what you need to do if you want to compare the field on its first two letters only. The wizard only appears to take full fields.
So you can post your SQL if you want, and we can see how we can change it to make it work. Then you would have to start using that query, rather than the query wizard, to find you dupes, because the customization we would introduce would not be available using the wizard.
hth
Mark
 
Have a look at Damareau-Levensthein Distance. It will provide a score for the number of character changes required to get from one string to another. It is fantastic on differences due to typographical errors anywhere in the strings.

A sample database is provided in post 4 on this thread.
 

Users who are viewing this thread

Back
Top Bottom