Cowboy_BeBa
Registered User.
- Local time
- Tomorrow, 04:18
- Joined
- Nov 30, 2010
- Messages
- 188
Hi all
Im working with a large list, over 30k records and (frankly) its driving me a lil nuts
i had to create a cartesian join on a single table, so i could get a list of two similar records on one row and make it easier to compare the two (looking for similar records that may be duplicates)
of course the obvious issue is that this doubles up on rows (ie ID numbers 10001 and 10005 are joined, im looking at data from both records on one row, somewhere down the list im going to find another row thats exactly the same just with the ID's reversed)
Ive created a field (Amalgam) that basically combines the two ID Numbers based on which number is larger, giving me an identifier for these duplicate records ("amalgam: IIf([ID1]>[ID2],[ID1] & " & " & [ID2],[ID2] & " & " & [ID1])")
Now what id like to do is run a Delete query that will identify all records where Amalgam isnt unique (ive deleted a couple hundred manually so not all of the fields will be duplicates) and simply delete one of them
does anyone know of a way i can write a delete query that will only delete one of each instance of amalgam (and only if there are duplicates?)
EDIT: okay so while i was typing this i actually came up with a way to solve my own problem. Simply created a "Find Duplicates" query using the wizard, then created another query that summarizes the Find Duplicates query, it groups by the amalgam field, gives a count of another field and gives me the First ID field it finds, meaning i now have a list of approx 17k records that can be safely deleted with no loss of data
However this is a very convoluted way to go about doing this, so if anyone has a way to do it in one delete query i would still be very interested to hear it
Thanks again
Im working with a large list, over 30k records and (frankly) its driving me a lil nuts
i had to create a cartesian join on a single table, so i could get a list of two similar records on one row and make it easier to compare the two (looking for similar records that may be duplicates)
of course the obvious issue is that this doubles up on rows (ie ID numbers 10001 and 10005 are joined, im looking at data from both records on one row, somewhere down the list im going to find another row thats exactly the same just with the ID's reversed)
Ive created a field (Amalgam) that basically combines the two ID Numbers based on which number is larger, giving me an identifier for these duplicate records ("amalgam: IIf([ID1]>[ID2],[ID1] & " & " & [ID2],[ID2] & " & " & [ID1])")
Now what id like to do is run a Delete query that will identify all records where Amalgam isnt unique (ive deleted a couple hundred manually so not all of the fields will be duplicates) and simply delete one of them
does anyone know of a way i can write a delete query that will only delete one of each instance of amalgam (and only if there are duplicates?)
EDIT: okay so while i was typing this i actually came up with a way to solve my own problem. Simply created a "Find Duplicates" query using the wizard, then created another query that summarizes the Find Duplicates query, it groups by the amalgam field, gives a count of another field and gives me the First ID field it finds, meaning i now have a list of approx 17k records that can be safely deleted with no loss of data
However this is a very convoluted way to go about doing this, so if anyone has a way to do it in one delete query i would still be very interested to hear it
Thanks again
Last edited: