Remove Duplicates, leave one record (1 Viewer)

Gismo

Registered User.
Local time
Today, 13:20
Joined
Jun 12, 2017
Messages
1,298
Hi all,

Please could you assist

I have a table with plenty of duplicates which I need to remove and leave 1 record

I tried copying to a new table with no duplicates set but wont work

I have 5 criteria's to keep in mind when removing records
1. Category - Airframe and Engine
2 Type - could be 1 of 20
3. Series - Could be one of 56
4. Document No - possibility of about 3000
5. Release date - endless


Sample of a few records attached from my duplicates Query

1673436747120.png
 

Minty

AWF VIP
Local time
Today, 11:20
Joined
Jul 26, 2013
Messages
10,371
Have you tried using the Duplicates Wizard?

Alternatives are to dump data into excel and use the remove duplicates option, and then paste it back into Access.
 

Gismo

Registered User.
Local time
Today, 13:20
Joined
Jun 12, 2017
Messages
1,298
Have you tried using the Duplicates Wizard?

Alternatives are to dump data into excel and use the remove duplicates option, and then paste it back into Access.
The sample is from my duplicates query
I now need to identify one record of the duplicates to keep and delete the balance
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:20
Joined
May 7, 2009
Messages
19,243
add autonumber and the rest is easy.

or create a Temp table (with same structure) and create a Compound index on those 4 fields (No duplicate).
then import the data from your table to the New Temp table.
examine the content of New Temp table when done importing.
if you are satisfied that no duplicate exists, delete the old table and rename
your temp table same as the original table.
 

Gismo

Registered User.
Local time
Today, 13:20
Joined
Jun 12, 2017
Messages
1,298
add autonumber and the rest is easy.

or create a Temp table (with same structure) and create a Compound index on those 4 fields (No duplicate).
then import the data from your table to the New Temp table.
examine the content of New Temp table when done importing.
if you are satisfied that no duplicate exists, delete the old table and rename
your temp table same as the original table.
Hi have multiple records with airframe and engine as description, if i make the new table category as no duplicates, I will only have 2 records in the table, one for airframe and one for engine, the other 29580 records wont be copied
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:20
Joined
May 7, 2009
Messages
19,243
if that is a problem, then define what is duplicate to your records?
 

ebs17

Well-known member
Local time
Today, 12:20
Joined
Feb 7, 2020
Messages
1,946
I have a table with plenty of duplicates
I would primarily prevent the creation of unwanted duplicates in a table, e.g. via a unique composite index over the combination of fields under consideration. Whret the beginnings! You don't have to dispose of waste that you don't create.

Remove means delete.
When deleting, you have to define exactly what is to be deleted or what is to be kept. A feature for this must be present in the table and be able to be evaluated. A running ID or a varying timestamp could be such a feature.

You don't see anything like that with you. Everything always looks the same there. The same can be grouped together. You could create a new table from this and then throw the existing one in the bin.
SQL:
SELECT DISTINCT
   FieldA,
   FieldB,
   FieldC,
   FieldD,
   FieldE
INTO
   NewTable
FROM
   YourTable

' ------------------------------

DROP TABLE YourTable
 

Gismo

Registered User.
Local time
Today, 13:20
Joined
Jun 12, 2017
Messages
1,298
if that is a problem, then define what is duplicate to your records?
That is what I am trying to do, just time consuming to go through 30 000 odd record

Was hoping to find a way to display the first record of a grouped duplications to either mark as keep or delete the balance

below is 5 records which are duplicated, 4 needs to be deleted
This is from the duplicate query

1673441910068.png


Below is the actual records from the main table

1673442066991.png
 

Minty

AWF VIP
Local time
Today, 11:20
Joined
Jul 26, 2013
Messages
10,371
As others have said - You need to determine what constitutes a duplicate, and if found which of the duplicated records you keep.

The simple answers have already been provided, create a query that groups on the desired fields, then join that back to a query that brings in the Min or Max Index for that grouping by joining the query to the original table.

That will then give you a unique record for those duplicates.

Use that to create a new table with an appropriate compound index to prevent further duplicates from being entered.
 

jjdebenedictis

New member
Local time
Today, 03:20
Joined
Sep 13, 2023
Messages
1
Not saying this is an elegant solution, but I finally got something working today that does this. (Which I needed, because I'm using linked Sharepoint Lists, and data integrity rules don't get enforced well on Sharepoint.)

WARNING: I haven't tested this yet for removing multiple duplicates, e.g. two different records that each have duplicates. I suspect it would completely delete one of those records, which is bad! It works fine when there's only one record that has duplicates.

Also note this explanation is for the benefit of weaker users of Access, like myself. I'm sure the pros will have many ways to improve this! 😁

The basic idea is:
1) Create 1st query to find duplicates in my table
2) Create 2nd query to find the highest ID out of all those duplicates (i.e. 2nd query acts on the 1st query's results)
3) Create 3rd query to find all the rows from the 1st query that do NOT appear in the 2nd query
4) Create a query to delete, from the original table, all the results of the 3rd query.

So here is roughly how I did that:

1) 1st Query, to find duplicates in my table.

I used Query Wizard > Find Duplicates and went through the steps to create a query to find duplicates in my table (tbl) based on four fields (StudentNumber, CourseID, SemesterID, and YearID). By right-clicking on the query, you can switch to the SQL view, which looks something like this:

SELECT [tbl].*
FROM [tbl]
WHERE ((([tbl].StudentNumber)
In
(SELECT [StudentNumber]
FROM [tbl] As Tmp
GROUP BY [StudentNumber],[CourseID],[SemesterID],[YearID]
HAVING Count(*)>1
And [CourseID] = [tbl].[CourseID]
And [SemesterID] = [tbl].[SemesterID]
And [YearID] = [tbl].[YearID])));

2) 2nd Query, to find the highest ID out of all the duplicates in the list. NOTE: This is where I think bad things could happen if you have two or more records that have duplicates, because one of them will have a higher ID than the others, which means you'd be deleting all but one duplicate of that record, but every single instance of every other record that has a duplicate.

I used Query Wizard > Select Query, and then I moved the ID to the first field and, in the properties window, I set Top Values to 1 to only return the record with the biggest ID. (Fellow newbies, in the Query Wizard, when it prompts you to choose a table, look for the option to choose a query instead.)

SELECT TOP 1 [qry1].[ID], [qry1].(all the other fields were included here), ...
FROM [qry1];

3) 3rd Query, to find all the queries that appear in the 1st query but not the 2nd query.

I used Query Wizard > Find Unmatched, and went through the steps to check which records in query 1 do not appear in query 2 to get something like this:

SELECT [qry1].*
FROM [qry1] LEFT JOIN [qry2] ON [qry1].[ID] = [qry2].[ID]
WHERE ((([qry2].ID) Is Null));

4) 4th Query, to delete, from the original table, all the records in the 3rd query.

This is where things got painful for me for a while, because Access doesn't like to delete things when there's an inner join. The following work-around seemed to do what I needed it to. I used Query Design to create a blank query, then I right-clicked and switched to the SQL view, and I pasted in SQL of this form:

DELETE [tbl].*
FROM [tbl]
WHERE [tbl].[ID] IN (SELECT [qry3].[ID] FROM [qry3] );

NOTE: You can replace DELETE with SELECT in that code to check, non-destructively, whether your query is going to delete the correct records.
 
Last edited:

isladogs

MVP / VIP
Local time
Today, 11:20
Joined
Jan 14, 2017
Messages
18,225
I have often used/recommended the process recommended in post #12.
Another approach which takes fewer steps is to create an empty copy of the same table with no data then make a composite index of the fields that together form a unique record.
Next import all records into the new table. Due to the index created, not all records will be imported. There will be no duplicates.

EDIT Sorry but my answer appears to be a duplicate of the answer in post #6 which I hadn't read earlier!
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 06:20
Joined
Feb 19, 2002
Messages
43,275
@jjdebenedictis Welcome:)
To prevent duplicates, you need a unique index. Even SharePoint should allow you to create a unique index.

To get rid of existing duplicates, create a new table/list with the proper index defined.
Run an append query that appends all rows. The duplicates will be dropped.
Once the data is cleaned up. delete the old table and rename the new table to the name of the old table.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 06:20
Joined
Feb 19, 2002
Messages
43,275
@jjdebenedictis you are new here. While "likes" are always appreciated, they are not helpful in making these threads useful to people with similar problems. Do not feel it necessary to always thank everyone who posted. Keep your thanks for when they actually mean something. No one here is looking for likes. This isn't a social media platform. It is always best to actually mention the post that gave you the actual solution or tell people what you ended up doing.

Looks like @isladogs and I posted the same solution at the same time. Our solution is different than what you used. And @arnelgp I'm sorry but somehow I missed your post also. So, we actually have three people who posted this particular method;) which you apparently are not using.

The best solution uses the compound indes to keep the bad data from being added initially and that is why so many experts suggested it. Allowing duplicates to be added is poor practice. Do you need additional help with how to add code to the BeforeUpdate event of the form to prevent the bad data from being added in the first place?
 

isladogs

MVP / VIP
Local time
Today, 11:20
Joined
Jan 14, 2017
Messages
18,225
Pat
Just for info @jjdebenedictis wasn't the OP in this thread. That was @Gismo who I'm alerting to the various updates since she/he last replied to this thread in post #10
 

Users who are viewing this thread

Top Bottom