SQL Subquery to delete duplicate records.

eddix99 · Jul 24, 2017

Tearing my hair out here!

I have a table with over 100k records. Many of these records have multiple duplicates, but all have a unique primary key.
I'm trying to write a query in Access SQL, where only the record with the lowest primary key value for each record will display. From there I can do an update query to add a 'true' attribute to a checkbox and then delete all the records with a 'false' attribute; thus clearing out all the duplicates
To do this I need to write a query and include a sub query:

SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]
WHERE Min(id) AND
(SELECT Business Data_TEST].[Company Name]
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name])
;

I cant get the query to work and I'm sure that it would take seconds for someone with MS SQL skills to identify where i'm going wrong. [I'm pretty sure i'm going wrong at the first WHERE statement]
I include a demo DB
Many thx in advance.

eddix99 · Jul 24, 2017

I had a look at another thread on this forum and made some changes, this is my latest attempt:

SELECT [Business Data_TEST].[ID], [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID <>
(SELECT Min(ID)
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name]
)
;

MarkK · Jul 24, 2017

You may want to GROUP BY the fields that are duplicated, then find the Min(ID), something like...

Code:

SELECT Min(ID) As MinID, Count(*) As DupeCount, CompanyName, ZipCode
FROM Table
GROUP BY CompanyName, ZipCode

...as a way to select the rows you are talking about.
hth
Mark

eddix99 · Jul 24, 2017

Thanks for the reply. I've had a go at this but get a 'syntax error in query expression' error message, specifically at the sub query part. Anyway, heres my latest attempt:

Code:

SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code], [Business Data_TEST].Telephone1 AS Expr1, [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE [Business Data_TEST].ID <>

(SELECT Min(ID) AS MinID, Count (*) AS DupeCount,Company Name
FROM [Business Data_TEST]
GROUP BY CompanyName
)
;

MarkK · Jul 24, 2017

Does that sub query work as a stand-alone query? Make sure that works first.

Then, in your WHERE clause, you can't use '<>' to compare a single field value in the main query against a sub query that returns multiple rows. You'll need the IN operator, like...

Code:

SELECT *
FROM YourTable
WHERE ID [I]<NOT>[/I] IN
   ( 
   SELECT Min(ID)
   FROM OtherTable
   GROUP BY Company, Zip
   )

hth
Mark

eddix99 · Jul 25, 2017

I got it working. Thank you so much for your assistance. I took your advice and got the subquery working before transplanting it back into the main query.

my code was:

Code:

SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID IN
 (SELECT  min(ID) AS lowid 
FROM [Business Data_TEST]
GROUP BY [Company Name]
)

ORDER BY [Business Data_TEST].[Company Name]

;

MarkK · Jul 25, 2017

Nicely done!

eddix99 · Jul 25, 2017

This query gives the results that i'm looking for, but takes upwards of 20mins to run.

My original table with duplicates had over 140,000 records. Now I have whittled it down to just over 30,000.

SQL Subquery to delete duplicate records.

eddix99

Registered User.

Attachments

eddix99

Registered User.

MarkK

bit cruncher

eddix99

Registered User.

MarkK

bit cruncher

eddix99

Registered User.

MarkK

bit cruncher

eddix99

Registered User.

Similar threads

Users who are viewing this thread