delete query, delete duplicates but keep one

Cowboy_BeBa · Jun 7, 2017

Hi all

Im working with a large list, over 30k records and (frankly) its driving me a lil nuts

i had to create a cartesian join on a single table, so i could get a list of two similar records on one row and make it easier to compare the two (looking for similar records that may be duplicates)
of course the obvious issue is that this doubles up on rows (ie ID numbers 10001 and 10005 are joined, im looking at data from both records on one row, somewhere down the list im going to find another row thats exactly the same just with the ID's reversed)

Ive created a field (Amalgam) that basically combines the two ID Numbers based on which number is larger, giving me an identifier for these duplicate records ("amalgam: IIf([ID1]>[ID2],[ID1] & " & " & [ID2],[ID2] & " & " & [ID1])")

Now what id like to do is run a Delete query that will identify all records where Amalgam isnt unique (ive deleted a couple hundred manually so not all of the fields will be duplicates) and simply delete one of them

does anyone know of a way i can write a delete query that will only delete one of each instance of amalgam (and only if there are duplicates?)

EDIT: okay so while i was typing this i actually came up with a way to solve my own problem. Simply created a "Find Duplicates" query using the wizard, then created another query that summarizes the Find Duplicates query, it groups by the amalgam field, gives a count of another field and gives me the First ID field it finds, meaning i now have a list of approx 17k records that can be safely deleted with no loss of data
However this is a very convoluted way to go about doing this, so if anyone has a way to do it in one delete query i would still be very interested to hear it

Thanks again

NauticalGent · Jun 7, 2017

This is an oversite on Micro Soft that I simply cannot believe they overlooked. As it stands now, there is no (to the best of my knowledge) to do this interactively.

However, there are a few workarounds.

One of the easiest is to make a select query, and in the properties sheet, set Unique Values to "Yes". If everything looks right, change the query to a Make Table, delete your old table and re-name your new table to the old one.

Not the cleanest approach, but it is the most simple.

The method I use is from this web-site:

http://www.databasejournal.com/feat...lete-Duplicate-Records-From-Access-Tables.htm

It involves using VBA but if you dare, Fortune Favors the Bold.

And finally, I found this one just last week from Allen Browne's site involving sub-queries. Not as in depth as the VBA solution but it does involve dabbling with SQL which I avoid like the clap. Not that they are a bad choice, but I can be dyslectic when it comes to SQL syntax!

http://allenbrowne.com/subquery-01.html

Whichever way you chose to go, remember to back-up your table first!

Good luck and please share your solution...

isladogs · Jun 7, 2017

I agree totally with the comments by Nautical Gent about how unnecessarily difficult this process is.

Also on the need to backup the table or db before deleting multiple records

Personally I would not use the make table & delete original table method as
1. It can cause database 'bloat'
2. It can be difficult to do if you are using linked tables

The way I have done this is to first add a bit field (Yes/No) to the table with default value=0. Lets call it Tag

Next:
1. Create a query to find duplicate records.
Make sure you include the table PK field in this
2. Create an aggregate query to find one of each duplicate record.
Use one of First / Last / Min / Max according to preference or whichever works best in your case
CHECK you are happy with these results before proceeding

3. Next run an update query to set Tag = True (-1) for those records
4. Then run a delete query on all duplicate records where Tag=False
5. Finally run an update query to re-set Tag =False on all remaining records that were not deleted

I use a routine which does each query in turn and it is normally very quick.
Obviously its still convoluted but it works.

If there are multiple duplicates, the code can still handle it as in step 3 only one record is 'tagged' so all remaining 'untagged' duplicates are deleted in step 4

BTW - once you have got rid of all duplicates, I would then re-design the table so that duplicates cannot be created in the future

NauticalGent · Jun 7, 2017

Interesting work-around Colin, I will add that to my list!

S_Preston · May 1, 2018

NauticalGent (06-07-2017, 04:00 AM, Re: delete query, delete duplicates but keep one) I reviewed the article you posted (Posted Sep 19, 2003 Delete Duplicate Records From Access Tables - Alternate more flexible solution By Danny Lesandrini) and I have Q's. I'm new to Modules. I created a module by the name of DeleteDuplicateRecords, my macro is named Dups. I selected to runcode in the macro called DeleteDuplicateRecords. I receive message access cannot find DeleteDuplicateRecords in the expression. Any suggestions?

Mark_ · May 1, 2018

Two items...

First, are you trying to run a macro or trying to run a function you wrote in VBA in a global module? How you go about doing this can lead to different advice on fixing your issue. Please post the code and the error message. Also post the code you are trying to reference so we can help identify where the error originated.

Second, I normally create a table that holds the unique information that multiple records have, similar to what NG suggested. Rather than deleting the original table I allow the end user to see the two where the "Unique" values are in a parent and all original records are in a child. This allows the user to review and remove duplicate entries while also allowing for the option to code what happens to dependent records of the duplicate you are about to delete (merging sales from multiple "Customer accounts" to one and the like).

S_Preston · May 1, 2018

I created a general module named DeleteDuplicateRecords. Tbl1 contains my duplicate records (it's a copy of the original table). The field to be evaluated is labeled license. I copied the code (listed below) that was initial posted by Danny Lesandrini.

Sub DeleteDuplicateRecords(Tbl1 As String)
' Deletes exact duplicates from the specified table.
' No user confirmation is required. Use with caution.
Dim rst As DAO.Recordset
Dim rst2 As DAO.Recordset
Dim tdf As DAO.TableDef
Dim fld As DAO.Field
Dim strSQL As String
Dim varBookmark As Variant
Set tdf = DBEngine(0)(0).TableDefs(Tbl - 1)
strSQL = "SELECT * FROM " & Tbl - 1 & " ORDER BY "
' Build a sort string to make sure duplicate records are
' adjacent. Can't sort on OLE or Memo fields,though.
For Each fld In tdf.Fields
If (fld.Type <> dbMemo) And _
(fld.Type
<> dbLongBinary) Then
strSQL = strSQL & fld.Name & ", "
End If
Next fld
' Remove the extra comma and space from the SQL
strSQL = Left(strSQL, Len(strSQL) - 2)
Set tdf = Nothing
Set rst = CurrentDb.OpenRecordset(strSQL)
Set rst2 = rst.Clone
rst.MoveNext
Do Until rst.EOF
varBookmark = rst.Bookmark
For Each fld In rst.Fields
If fld.Value <> rst2.Fields(fld.Name).Value Then
GoTo NextRecord
End If
Next fld
rst.Delete
GoTo SkipBookmark
NextRecord:
rst2.Bookmark = varBookmark
SkipBookmark:
rst.MoveNext
Loop
End Sub

When I run this code this is the error I receive.

"Microsoft Access cannot find the name "DeleteDuplicateRecords' you entered in the expression. You may have specified a control that wasn't on the current object without specifying the correct form or report context. To refer to a control on another form or report, precede the control name with the name of a collection. This is usually either forms or Reports, and the name of the form or report to which the control belongs. For example, Forms![Products]!{Units In Stock]."

arnelgp · May 2, 2018

Can you post the table structure and tell us which fields are having duplicate.

jdraw · May 2, 2018

I agree with arnelgp we need to see some data structure and details of the "duplicates".

I don't understand this Tbl - 1 in your posted SQL.

Can you show a few examples of your "duplicates"?

S_Preston · May 2, 2018

"Tbl-1" has been replaced with Tbl1 in the code, that was a typing error that has been corrected. When I read the posting for this code https://www.databasejournal.com/fea...lete-Duplicate-Records-From-Access-Tables.htm it appeared that I could insert my table name in for "strTableName"?

Tbl1 structure:
Field name, License, short text, 13 characters, indexed allow duplicates, duplicate records do exist from original source data imported into Access.
Field name, Name, short text, 72
Field name, Address, short text, 20
Field name, City, short text, 19
Field name, State, short text, 2
Field name, Zip, short text, 5
Field name, Reg, Date/time, 10

isladogs · May 2, 2018

Have a look at my approach from post 3.
It doesn't use recordsets and you may find it easier to understand

S_Preston · May 2, 2018

ridders, thanks. I'm looking for the quickest fix to clean up my data for reports. I'm dealing with over 3 million records at a time, VBA has been good for me in the past

isladogs · May 2, 2018

For your setup I strongly recommend you try my approach

Using queries for checking/updating/deleting large numbers of records will almost certainly be far faster than looping through each record one at a time using recordsets.

I've used both methods. Large recordsets are SLOW

arnelgp · May 2, 2018

are all thise fiel you want to chk for duplicate?

the template to delete duplicate and leave 1:

DELETE (SELECT COUNT(*) FROM TABLE1 AS T1 WHERE (T1.field1 & T1.field2 = TABLE1.field1& TABLE1.field2) AND T1.ID <= TABLE1.ID) AS Expr1, *
FROM Table1
WHERE ((((SELECT COUNT(*) FROM TABLE1 AS T1 WHERE (T1.field1 & T1.field2 = TABLE1.field1& TABLE1.field2) AND T1.ID <= TABLE1.ID))>1));

You need autonumber (ID)

S_Preston · May 2, 2018

ridders, thanks! I was not aware of that fact. Like I said, I'm searching for the quickest solution.

arnelgp, I cannot create a autonumber ID field because I'm dealing with over 2 million records at a time. When I attempt to add autonumber field to the table I receive error message "file sharing lock count exceeded. Increase maxlocksperfile registry entry."

arnelgp · May 2, 2018

On immediate window in vba, type this to temporarily increase the maxlocksperfile:

Dbengine.SetOption dbMaxLocksPerFile, 1000000

Then edit the table add the auto. Increase the 1milion if access still complain

Or copy the tabke using ctrl-c and ctrl-v. Structure only. Edit the new table to add autonumber and append the orig table ti the copy.

isladogs · May 2, 2018

You can also increase the max locks per file permanently by editing it's registry entry/entries. There are several forum threads on this topic including
https://www.access-programmers.co.uk/forums/showthread.php?t=296647&highlight=Max+lock+per+file

Whether you increase it temporarily or permanently from the default value of 9500, I would suggest increasing in small amounts.
I regularly work with tables containing almost 3 million records and a setting of 15000 works fine for me.
In fact I recently added an autonumber field to a table of 2.6 million records without hitting a max locks issue

If you make it larger than necessary, you may 'unbalance' overall use of system resources and create other problems.

Mark_ · May 2, 2018

general module named DeleteDuplicateRecords

and

Sub DeleteDuplicateRecords

While it won't address some of your existing issues, having multiple items with the same name can confuse ACCESS. Best to have a general module with a name like "GlobalProcedures" or such.

Personally, I'd go with Colin's approach. Much quicker, especially if are dealing with an SQL server that can do all the work for you.

jdraw · May 2, 2018

S_Preston,

An approach I have used to remove duplicates is as follows:

-bring your incoming data from external source to a temp table.
-identify the unique fields or combinations that "identify your duplicates"
-create a new table and ensure you have a primary key and unique composite index(es) to identify those combinations you need to be unique
-use append query to populate this new table
-first unique record will be inserted
-the database software (DBMS) will reject duplicate values for these index(es) [typically 3022 error]

If you have any concerns or anomalies because you haven't identified what must be unique correctly, you still have the temp table and you can fine tune the process.

The database management system will reject duplicates according to your specification of what must be unique.
You can accomplish this "purging of duplicates" by means vba code; but the DBMS is designed to do this.

Perhaps your conditions are sufficiently different, but this approach is quite common.

Quite often it is the definition of "duplicate" that is the crux of the issue.

arnelgp · May 2, 2018

jdraw method is the simpliest.

When i post vba against sql some wise guy react. now sonebidy post vba, approved without thinking?! Who are you?

delete query, delete duplicates but keep one

Registered User.

Ignore List Poster Boy

MVP / VIP

Ignore List Poster Boy

Registered User.

Longboard on the internet

Registered User.

..forever waiting... waiting for jellybean!

Super Moderator

Registered User.

MVP / VIP

Registered User.

MVP / VIP

..forever waiting... waiting for jellybean!

Registered User.

..forever waiting... waiting for jellybean!

MVP / VIP

Longboard on the internet

Super Moderator

..forever waiting... waiting for jellybean!

Similar threads

Users who are viewing this thread