Need to find Partial Matches in Address Field (1 Viewer)

rkporsley

New member
Local time
Today, 06:31
Joined
Jan 7, 2016
Messages
5
Hello, I'm fairly new to Access, I was hired at a magazine as a publishing assistant about 4 months ago and our subscriber database is all in Microsoft Access 2007, BUT in Access 2000 format, if that helps at all.

I also have to clean data from a local government agency to create a list of people to send free magazines to. This is where I thought that there has to be a faster and more efficient way of "cleaning names" as we call it.

I have six fields: FirstName, LastName, Address, City, State, Zip.

My main objective is to make sure that no address receives more than one magazine, therefore I need to remove all duplicate addresses. I have already utilized Find/Replace to clean up things like "St" and "Street," etc. to more easily remove duplicates, but now I want to go further.

I want to be able to find all addresses that are similar in the beginning so I can easily find messy data and remove it. For example, I want to be able to run a query of some sort that would show these addresses due to their similarity:

1035 Fairview Dr
1035 Fair View Dr
1035 Fair View Apt 5

or

127 Sprucewood Dr Apt #4
127 Sprucewood Dr Apt 4

If nothing else, I would like to be able to find all addresses that have the same first 3-5 numbers/characters.

I apologize in advance for my lack of Access knowledge, but I really do appreciate any advice available.
 

sneuberg

AWF VIP
Local time
Today, 07:31
Joined
Oct 17, 2014
Messages
3,506
Here something to get you started. What you can do is put a Dcount in the query as an expression that counts the duplicates and the criteria for this expression would then be >0. Duplicate here meaning the first five characters of the address match. You could do this with a expression something like:

Code:
NumberOfDups: DCount("*", "[tblAddress]", "Left([Address],5) = '" & Left([Address],5) & "' AND [ID] <> " & [ID])

In this example tblAddress would be the table with the addresses, Address is the name of the address field, and ID is the primary key of the table. If your primary key is text, then you would have to add single quotes in this.

The attached database has a query that demonstrates this.

I think you need more than this though. For these case where the first part of the addresses match you would also want matching city and zip. I'll sleep on that one and let know if I come up with a complete solution for you.
 

Attachments

  • PartDuplicate.zip
    18.3 KB · Views: 172

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 00:31
Joined
Jan 20, 2009
Messages
12,852
Take a look at the Damerau-Levenshtein function in post 10 of this thread.

It will provide a rating for the level of similarity between two strings based on the number of character changes it takes to get from one string to the other. If the difference is the reversal of two characters it is considered to be closer than changing two characters so it is great for finding typographical errors too.

Join the table to itself without an ON condition.
Now feed the function with the address field from each alias of the table and sort ascending by the output of the function. The closest matches will be at the top.

You can add a condition that one of the strings is greater than the other so it will skip perfect matches and only test in one direction. This may save some time.
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 00:31
Joined
Jan 20, 2009
Messages
12,852
Attached a rough example of how to use the Damerau-Levenshtein function in Access.

BTW it can be used in Excel too. The Maxof and MinOf functions should be omitted for Excel because they are already built in to Excel.
 

Attachments

  • DLMatching.zip
    16.5 KB · Views: 216

sneuberg

AWF VIP
Local time
Today, 07:31
Joined
Oct 17, 2014
Messages
3,506
You'll probably want to go with Galaxiom's solution but I though I'd post my extended solution anyway. To catch the partially matching addresses with matching city, state, and zip it's just a matter of extending the criteria of the Dcount as is done below and in the attached database. In this revision I left out the part that excluded records that matched themselves so this doesn't required a primary key in the Dcount criteria; however; the criteria for the expression is now >1

Code:
NumberOfDups: DCount("*","[tblAddress]","Left([Address],5) = '" & Left([Address],5) & "' AND [City] = '" & [City] & "' AND [State] = '" & [State] & "' AND [ZIP] = '" & [ZIP] & "'")

I wouldn't surprise me if this doesn't work with a large number of records in that it would just take to long, so I hope Galaxiom's solution works for you.
 

Attachments

  • PartDuplicate.zip
    18.7 KB · Views: 165

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 00:31
Joined
Jan 20, 2009
Messages
12,852
Of course the D-L function would find people living on the same street as close matches if you just compared address.

To compare multiple fields for similarities, apply the function to each field then get the product of all the results.
 

Users who are viewing this thread

Top Bottom