SQL Subquery to delete duplicate records. (1 Viewer)

eddix99

Registered User.
Local time
Today, 22:43
Joined
Jul 7, 2016
Messages
32
Tearing my hair out here!

I have a table with over 100k records. Many of these records have multiple duplicates, but all have a unique primary key.
I'm trying to write a query in Access SQL, where only the record with the lowest primary key value for each record will display. From there I can do an update query to add a 'true' attribute to a checkbox and then delete all the records with a 'false' attribute; thus clearing out all the duplicates
To do this I need to write a query and include a sub query:

SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]
WHERE Min(id) AND
(SELECT Business Data_TEST].[Company Name]
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name])
;

I cant get the query to work and I'm sure that it would take seconds for someone with MS SQL skills to identify where i'm going wrong. [I'm pretty sure i'm going wrong at the first WHERE statement]
I include a demo DB
Many thx in advance.
 

Attachments

I had a look at another thread on this forum and made some changes, this is my latest attempt:

SELECT [Business Data_TEST].[ID], [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID <>
(SELECT Min(ID)
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name]
)
;
 
You may want to GROUP BY the fields that are duplicated, then find the Min(ID), something like...
Code:
SELECT Min(ID) As MinID, Count(*) As DupeCount, CompanyName, ZipCode
FROM Table
GROUP BY CompanyName, ZipCode
...as a way to select the rows you are talking about.
hth
Mark
 
Thanks for the reply. I've had a go at this but get a 'syntax error in query expression' error message, specifically at the sub query part. Anyway, heres my latest attempt:
Code:
SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code], [Business Data_TEST].Telephone1 AS Expr1, [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE [Business Data_TEST].ID <>

(SELECT Min(ID) AS MinID, Count (*) AS DupeCount,Company Name
FROM [Business Data_TEST]
GROUP BY CompanyName
)
;
 
Does that sub query work as a stand-alone query? Make sure that works first.

Then, in your WHERE clause, you can't use '<>' to compare a single field value in the main query against a sub query that returns multiple rows. You'll need the IN operator, like...
Code:
SELECT *
FROM YourTable
WHERE ID [I]<NOT>[/I] IN
   ( 
   SELECT Min(ID)
   FROM OtherTable
   GROUP BY Company, Zip
   )
hth
Mark
 
I got it working. Thank you so much for your assistance. I took your advice and got the subquery working before transplanting it back into the main query.

my code was:
Code:
SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID IN
 (SELECT  min(ID) AS lowid 
FROM [Business Data_TEST]
GROUP BY [Company Name]
)

ORDER BY [Business Data_TEST].[Company Name]

;
 
This query gives the results that i'm looking for, but takes upwards of 20mins to run.

My original table with duplicates had over 140,000 records. Now I have whittled it down to just over 30,000.

:)
 

Users who are viewing this thread

Back
Top Bottom