Need to Find Entries with same first numbers/characters

rkporsley · Jan 8, 2016

Hello, I've recently started a job where I need to clean up messy data for work. I posted a thread a bit ago, but I feel as if I should ask it in a better way that is easier to understand.

I have a DB for hunting/fishing license holders. Fields include: FirstName, LastName, Address, City, State, Zip.

I use this list to send out free magazines- because multiple persons at the same address get hunting/fishing licenses, I need to clean up the data so there is only one listing for each address.

I have used Find/Replace to clean up words and abbreviations and then removed duplicates, but I want to be able to see only addresses where the first 3-5 characters match. Here's an example:

I currently show this:
1025 Dewberry Ln
1025 Lakeside
103 E Danview Ave
104 E Danview Ave #4
104 Danview Ave Apt 4
104 E Danview Avenue

I've been scrolling through tens of thousands of these and deleting records that match to me, but no exact matches.

I would like to run a query that would turn the above into this:
104 E Danview Ave
104 E Danview Ave #4
104 Danview Ave Apt 4

This would be assuming I found records whose first 3 characters match.

I'm very new to this, but after seeing what Access can do, I had to find out if this was something I could do. Thank you in advance for your help.

JHB · Jan 8, 2016

rkporsley said:
..

I have used Find/Replace to clean up words and abbreviations and then removed duplicates, but I want to be able to see only addresses where the first 3-5 characters match. Here's an example:

The above criteria is so far I can see, not what you show in the result you want.

rkporsley said:
1025 Dewberry Ln
1025 Lakeside
103 E Danview Ave
104 E Danview Ave #4
104 Danview Ave Apt 4
104 E Danview Avenue

I've been scrolling through tens of thousands of these and deleting records that match to me, but no exact matches.

I would like to run a query that would turn the above into this:
104 E Danview Ave
104 E Danview Ave #4
104 Danview Ave Apt 4

This would be assuming I found records whose first 3 characters match.

plog · Jan 8, 2016

I used to do this a lot--clean up mailing lists. I think you are on the right track with this method and here's how to make it better.

1. Backup your membership list. Possibly just copy the table to a blank database to do this work in. Let's call this list WorkingList

2. Add an autonumber primary key to WorkingList called ID. Now you can uniquely identify each row.

3. Create a table with one numeric field called ToDelete.

4. Make a query based on WorkingList and bring in every field. Call this query MatchQuery.

5. In MatchQuery, for every field of real data (not the id) make a calculated field that gets the first 4 characters of every field:

FirstName_4: Mid(FirstName, 1,4)
Address1_4: Mid(Address1, 1,4)
....

6. Build another query and bring in 2 instances of MatchQuery. The second instance will come in as MatchQuery_1.

7. Make a calculated field like so:

DifferentRecord: Iif(MatchQuery.ID=MatchQuery_1.ID, 0, 1)

Then under that, in the criteria section, put 1. This will make it so a record will never match itself.

8. Link every _4 field in MatchQuery with its counterpart in MatchQuery_1.

9. Bring down all the actual fields (ones without _4 sufix) from MatchQuery and then all the actual fields from MatchQuery_1.

10. Sort the query by every field Ascending (this will make the possible duplicates show next to another).

11. Run the query and you have your possible duplicates.

12. Put the ID values of the duplicates you want to delete into the ToDelete table. This will show both duplicates, so be careful not to eliminate both, just 1 of them.

13. Build a query linking ToDelete to WorkingList and delete matching results.

14. Go to Step 8 and loosen your criteria and rerun process until all duplicates are gone.

A bunch of notes:

-The first pass I made this as restrictive as possible (see Step 8). I made it so every _4 field of one table matched the same _4 character field of the other. You can loosen this criteria by eliminating some of those linked fields. For example, if you didn't care about names at the same address you could eliminate the links between FirstName_4 and LastName_4. That way, Susan Smith and Dan Jones living at the same address would only receive one magazine.

-you can make it more restrictive by not matching on the _4 field but the actual field. You could not match on Zip_4 but instead match on Zip. Or you could do the same for LastName instead of LastName_4. I would normally make it less restrictive in one section and more restrictive in another after the first pass. For example, eliminate Street Address match, but match on full first/last name and full zip. You play with it to find out what works for you.

-The above method ultimate puts the decision on you to manually mark each as a duplicate. If you run the query and they all are duplicates and you want to speed up the process, you would remove all the fields but the DifferentRecord one and bring down the ID of just MatchQuery and all the fields of the two tables are linked on (the _4 fields). You would change that query to an aggregate query (click the Sigma/Summation) and then change the Group By under the ID to MIN. Run that query and paste the ID value into ToDelete and delete those. That way you would only delete 1 of the duplicates in each duplicate set (be leary there could be multiple duplicates).

sneuberg · Jan 8, 2016

rkporsley said:
I posted a thread a bit ago, .

16 minutes is pretty small bit. It's at http://www.access-programmers.co.uk/forums/showthread.php?p=1469424#post1469424 where answers have been posted

rkporsley · Jan 8, 2016

Thanks plog, this is still a bit over my head, but I'll continue to try to get the hang of it. At this point I'm trying to understand the language so I can attempt these solutions.

Need to Find Entries with same first numbers/characters

rkporsley

New member

JHB

Have been here a while

plog

Banishment Pending

sneuberg

AWF VIP

rkporsley

New member

Similar threads

Users who are viewing this thread