Finding Duplicates Upon Entry

Hey Lucy

Registered User.
Local time
Today, 07:27
Joined
Jan 20, 2012
Messages
124
#pulllingmyhairout#

I have a client for which I am developing a database in Access 2010.

The data for the DB will be imported in from an Excel spreadsheet. This spreadsheet is created by my client based on information HE imports from several different organizations, appending data into one spreadsheet. The spreadsheet contains over 500 records and more will be added daily.

The spreadsheet contains an Address field for street number and name. He would like an alert when a duplicate address is being manually entered into the database or imported in from his spreadsheet. The problem is the inconsistencies in spelling and abbreviations and such. For instance, he may have 123 Main Street or 123 Main St or even 123 Mn St. These would all actually be the same address (record), but you couldn't possibly find it with a Find Duplicates Query.

His data sources is where the problem lies with inconsistency, but there is nothing that can be done there because he may download comma-delimited files from several places, created by 100's of different users. It would be impossible to set entry rules for the source.

The only thing I could think to do would be to somehow find duplicates AFTER the import into Access, but again, a Find Duplicates query won't work.

I also tried setting up a criteria field in a query that requires the user to enter specified data. I was thinking you might could ask the user to enter a "1" followed by an asterisk, for example, to find all properties that's address number began with 1. I know it would find everything from 101 to 1193486 in this case, but would be a way to pull out at least similar records then just plain eyeball for duplicates.

Does anybody have any suggestions for me? Again, I don't know SQL, so any solution would have to be within Access itself.

Help! And thanks!
 
What fields are being imported?
You mention Address with street number and street name.
Do you have City State and Zip? Do you have phone number?

What is the purpose of the data (the end result)?

There is no magic bullet. It requires edit/validation, but requires some work. I would NOT CONSIDER looking at this as a manual effort. There are some things that can be done to focus "potential duplicates" and that should reduce the manual "eyeballing" of records.
 
Thanks for your response. This one is a doozy! I have just learned from my client that he actually retrieves his data from three different sources and imports into Excel spreadsheets. He wants all this data merged into one SS that can be appended daily to an Access table. Each data set, while all related to a single record, is formatted differently. Some of the data is duplicated, some not. Some of the SS's have blank field spaces, some don't.

One of my collaborator thinks he has a method of merging the data and finding duplicates by concatenating. We are going to meet next week and see what we can do.
 
What I would tend to do, when confronted with an address migration task, is to start by concatenate everything together (Street address, Town, County, State, Zip Code and Country) and then reduce it to an alias, a lower case string of pure letters and numbers.

For example the address below ...

StreetAddress, Town, County, State, ZipCode, Country, addressId
"123 Main Street", "Abrams","Oconto","Wisconsin","Z123 2B","USA",0

would become "123mainstreetabramsocontowisconsinz1232busa". This would get rid of spaces, carriage returns, commas, full stops (periods) etc.

I would then build up a table of such aliases which would be linked back to a proper address.

addressID, alias
1, "123mnstabramswisconsin"
1, "123mainstabramswisconsinz123usa"
1, "123mainstreetabramsocontowisconsinz1232busa"

2, "1600pennsylvaniaavenuenorthwestwashingtondc20500usa"

It would a bit of a pain to begin with but as the table grew fewer aliases would, in theory, need to be added.

You would match the value generated from the address to the alias table and assign the proper addressId to the record.

StreetAddress, Town, County, State, ZipCode, Country, addressId
"123 Main Street", "Abrams","Oconto","Wisconsin","Z123 2B","USA",1
 
I would start by questioning the need for the Excel spreadsheet(s). If you're moving to a database, and you have data coming "into your business" from multiple sources, but in digital format, I would do the pre-processing etc all within Access.

My feeling and experience suggests that the more software packages involved, the more maintenance, training and cost and there's always the updates that require adjustment to each package and your interfaces between packages.

I would look at the target format required; take a look at the incoming sources and create a mapping of that input to the "desired target". I would bring the incoming data into a temporary table and work to create edit validation to accomplish the mapping to the target. I would suggest looking at user defined functions to take some input and return some aspect of the target. Those functions could be used with the 3 data sources (or more if your boss decides to expand his inputs).

One of the things I have found in dealing with addresses is to get the standards from the postal service regarding address formats. You'll find (as you have noted) a number of Road Rd Rd., Street, St ; ... etc, - I'd look at a look up for most combinations to the Postal standard. You will need a level of detail to differentiate "duplicates".

If you have Postal Code/Zip, that will do a lot for consistent City and State spellings.
If you have phone numbers and zip/Postal, that combination can help in duplicate resolution.

I don't think you will ever get rid of some "eyeballing" of the data, but reducing it would be the goal.

Good luck with your project.
 
nanscombe and jdraw - thanks to both of you for great suggestions!! As far as importing directly into Access rather than Excel, I was going to suggest that to my client. They have only been using Excel up until now (well, up until I get the DB build finished).
 

Users who are viewing this thread

Back
Top Bottom