Remove Duplicates, leave one record

Gismo · Jan 11, 2023

Hi all,

Please could you assist

I have a table with plenty of duplicates which I need to remove and leave 1 record

I tried copying to a new table with no duplicates set but wont work

I have 5 criteria's to keep in mind when removing records
1. Category - Airframe and Engine
2 Type - could be 1 of 20
3. Series - Could be one of 56
4. Document No - possibility of about 3000
5. Release date - endless

Sample of a few records attached from my duplicates Query

Minty · Jan 11, 2023

Have you tried using the Duplicates Wizard?

Alternatives are to dump data into excel and use the remove duplicates option, and then paste it back into Access.

Gismo · Jan 11, 2023

Minty said:
Have you tried using the Duplicates Wizard?

Alternatives are to dump data into excel and use the remove duplicates option, and then paste it back into Access.

The sample is from my duplicates query
I now need to identify one record of the duplicates to keep and delete the balance

Gasman · Jan 11, 2023

No PK?

Gismo · Jan 11, 2023

Gasman said:
No PK?

I have a PK in the master file
If I add it to the duplicate query it wont show the duplicates, true

arnelgp · Jan 11, 2023

add autonumber and the rest is easy.

or create a Temp table (with same structure) and create a Compound index on those 4 fields (No duplicate).
then import the data from your table to the New Temp table.
examine the content of New Temp table when done importing.
if you are satisfied that no duplicate exists, delete the old table and rename
your temp table same as the original table.

Gismo · Jan 11, 2023

arnelgp said:
add autonumber and the rest is easy.

or create a Temp table (with same structure) and create a Compound index on those 4 fields (No duplicate).
then import the data from your table to the New Temp table.
examine the content of New Temp table when done importing.
if you are satisfied that no duplicate exists, delete the old table and rename
your temp table same as the original table.

Hi have multiple records with airframe and engine as description, if i make the new table category as no duplicates, I will only have 2 records in the table, one for airframe and one for engine, the other 29580 records wont be copied

arnelgp · Jan 11, 2023

if that is a problem, then define what is duplicate to your records?

ebs17 · Jan 11, 2023

I have a table with plenty of duplicates

I would primarily prevent the creation of unwanted duplicates in a table, e.g. via a unique composite index over the combination of fields under consideration. Whret the beginnings! You don't have to dispose of waste that you don't create.

Remove means delete.
When deleting, you have to define exactly what is to be deleted or what is to be kept. A feature for this must be present in the table and be able to be evaluated. A running ID or a varying timestamp could be such a feature.

You don't see anything like that with you. Everything always looks the same there. The same can be grouped together. You could create a new table from this and then throw the existing one in the bin.

SQL:

SELECT DISTINCT
   FieldA,
   FieldB,
   FieldC,
   FieldD,
   FieldE
INTO
   NewTable
FROM
   YourTable

' ------------------------------

DROP TABLE YourTable

Gismo · Jan 11, 2023

arnelgp said:
if that is a problem, then define what is duplicate to your records?

That is what I am trying to do, just time consuming to go through 30 000 odd record

Was hoping to find a way to display the first record of a grouped duplications to either mark as keep or delete the balance

below is 5 records which are duplicated, 4 needs to be deleted
This is from the duplicate query

Below is the actual records from the main table

Minty · Jan 11, 2023

As others have said - You need to determine what constitutes a duplicate, and if found which of the duplicated records you keep.

The simple answers have already been provided, create a query that groups on the desired fields, then join that back to a query that brings in the Min or Max Index for that grouping by joining the query to the original table.

That will then give you a unique record for those duplicates.

Use that to create a new table with an appropriate compound index to prevent further duplicates from being entered.

jjdebenedictis · Sep 13, 2023

Not saying this is an elegant solution, but I finally got something working today that does this. (Which I needed, because I'm using linked Sharepoint Lists, and data integrity rules don't get enforced well on Sharepoint.)

WARNING: I haven't tested this yet for removing multiple duplicates, e.g. two different records that each have duplicates. I suspect it would completely delete one of those records, which is bad! It works fine when there's only one record that has duplicates.

Also note this explanation is for the benefit of weaker users of Access, like myself. I'm sure the pros will have many ways to improve this!

The basic idea is:
1) Create 1st query to find duplicates in my table
2) Create 2nd query to find the highest ID out of all those duplicates (i.e. 2nd query acts on the 1st query's results)
3) Create 3rd query to find all the rows from the 1st query that do NOT appear in the 2nd query
4) Create a query to delete, from the original table, all the results of the 3rd query.

So here is roughly how I did that:

1) 1st Query, to find duplicates in my table.

I used Query Wizard > Find Duplicates and went through the steps to create a query to find duplicates in my table (tbl) based on four fields (StudentNumber, CourseID, SemesterID, and YearID). By right-clicking on the query, you can switch to the SQL view, which looks something like this:

SELECT [tbl].*
FROM [tbl]
WHERE ((([tbl].StudentNumber)
In
(SELECT [StudentNumber]
FROM [tbl] As Tmp
GROUP BY [StudentNumber],[CourseID],[SemesterID],[YearID]
HAVING Count(*)>1
And [CourseID] = [tbl].[CourseID]
And [SemesterID] = [tbl].[SemesterID]
And [YearID] = [tbl].[YearID])));

2) 2nd Query, to find the highest ID out of all the duplicates in the list. NOTE: This is where I think bad things could happen if you have two or more records that have duplicates, because one of them will have a higher ID than the others, which means you'd be deleting all but one duplicate of that record, but every single instance of every other record that has a duplicate.

I used Query Wizard > Select Query, and then I moved the ID to the first field and, in the properties window, I set Top Values to 1 to only return the record with the biggest ID. (Fellow newbies, in the Query Wizard, when it prompts you to choose a table, look for the option to choose a query instead.)

SELECT TOP 1 [qry1].[ID], [qry1].(all the other fields were included here), ...
FROM [qry1];

3) 3rd Query, to find all the queries that appear in the 1st query but not the 2nd query.

I used Query Wizard > Find Unmatched, and went through the steps to check which records in query 1 do not appear in query 2 to get something like this:

SELECT [qry1].*
FROM [qry1] LEFT JOIN [qry2] ON [qry1].[ID] = [qry2].[ID]
WHERE ((([qry2].ID) Is Null));

4) 4th Query, to delete, from the original table, all the records in the 3rd query.

This is where things got painful for me for a while, because Access doesn't like to delete things when there's an inner join. The following work-around seemed to do what I needed it to. I used Query Design to create a blank query, then I right-clicked and switched to the SQL view, and I pasted in SQL of this form:

DELETE [tbl].*
FROM [tbl]
WHERE [tbl].[ID] IN (SELECT [qry3].[ID] FROM [qry3] );

NOTE: You can replace DELETE with SELECT in that code to check, non-destructively, whether your query is going to delete the correct records.

theDBguy · Sep 13, 2023

Hi @jjdebenedictis. Welcome to AWF!

isladogs · Sep 13, 2023

I have often used/recommended the process recommended in post #12.
Another approach which takes fewer steps is to create an empty copy of the same table with no data then make a composite index of the fields that together form a unique record.
Next import all records into the new table. Due to the index created, not all records will be imported. There will be no duplicates.

EDIT Sorry but my answer appears to be a duplicate of the answer in post #6 which I hadn't read earlier!

Pat Hartman · Sep 13, 2023

@jjdebenedictis Welcome

To prevent duplicates, you need a unique index. Even SharePoint should allow you to create a unique index.

To get rid of existing duplicates, create a new table/list with the proper index defined.
Run an append query that appends all rows. The duplicates will be dropped.
Once the data is cleaned up. delete the old table and rename the new table to the name of the old table.

Pat Hartman · Sep 19, 2023

@jjdebenedictis you are new here. While "likes" are always appreciated, they are not helpful in making these threads useful to people with similar problems. Do not feel it necessary to always thank everyone who posted. Keep your thanks for when they actually mean something. No one here is looking for likes. This isn't a social media platform. It is always best to actually mention the post that gave you the actual solution or tell people what you ended up doing.

Looks like @isladogs and I posted the same solution at the same time. Our solution is different than what you used. And @arnelgp I'm sorry but somehow I missed your post also. So, we actually have three people who posted this particular method

which you apparently are not using.

The best solution uses the compound indes to keep the bad data from being added initially and that is why so many experts suggested it. Allowing duplicates to be added is poor practice. Do you need additional help with how to add code to the BeforeUpdate event of the form to prevent the bad data from being added in the first place?

isladogs · Sep 19, 2023

Pat
Just for info @jjdebenedictis wasn't the OP in this thread. That was @Gismo who I'm alerting to the various updates since she/he last replied to this thread in post #10

Remove Duplicates, leave one record

Gismo

Registered User.

Minty

AWF VIP

Gismo

Registered User.

Gasman

Enthusiastic Amateur

Gismo

Registered User.

arnelgp

..forever waiting... waiting for jellybean!

Gismo

Registered User.

arnelgp

..forever waiting... waiting for jellybean!

ebs17

Well-known member

Gismo

Registered User.

Minty

AWF VIP

jjdebenedictis

New member

theDBguy

I’m here to help

isladogs

MVP / VIP

Pat Hartman

Super Moderator

Pat Hartman

Super Moderator

isladogs

MVP / VIP

Similar threads

Users who are viewing this thread