Query to delete duplicates?

MarcieFess

Registered User.
Local time
Today, 08:01
Joined
Oct 25, 2012
Messages
107
I have a problem that was caused by bad database architecture. I didn't know I could set a unique key that was a combination of 2 non-unique fields. I know that now.

Now I have a table with 1.3 million records. The true size of the table should be about one-tenth of that.

I believe the query can be run using 2 fields: ProductKey and StoreKey

I should have only 1 entry for each unique combination of ProductKey and StoreKey

Because I didn't have my attributes set properly when I imported files into the existing database, I have anywhere from 4 to 24 or more of each possible combination.

I have a query:

Code:
SELECT DISTINCT [Copy of tblStoreProducts3].ProductKey, 
[Copy of tblStoreProducts3].StoreKey, [Copy of tblStoreProducts3].UPC

FROM  [Copy of tblStoreProducts3]

WHERE ((([Copy of tblStoreProducts3].ProductKey) In (SELECT[ProductKey]
 FROM 
[Copy of tblStoreProducts3] As Tmp GROUP BY [ProductKey],
[StoreKey] HAVING Count(*)>1, And [StoreKey] = 
[Copy of tblStoreProducts3.[StoreKey])))

ORDER BY [Copy of tblStoreProducts3].ProductKey, [Copy of tblStoreProducts3].StoreKey;

First of all, this is taking FOREVER to run.

Secondly, I deleted about 1,000 duplicates by hand. I don't want the unique records to get deleted.

What I want: retain one record for each unique ProductKey/StoreKey combination, and delete the rest of the duplicates.

I appreciate your help!

Marcie Fessler
 
You might try something a little more performance-friendly, like an INNER JOIN to an inline subquery, something like the following:
Code:
SELECT DISTINCT T1.ProductKey, 
T1.StoreKey, T1.UPC
FROM  [Copy of tblStoreProducts3] AS T1
INNER JOIN (
    SELECT ProductKey, StoreKey
    FROM [Copy of tblStoreProducts3]
    GROUP BY ProductKey, StoreKey
    HAVING Count(*) > 1
    ) AS T2 ON T1.ProductKey = T2.ProductKey
           AND T1.StoreKey = T2.StoreKey
ORDER BY T1.ProductKey, T1.StoreKey;
 
Another approach is to create a totals query and group on every field in your table. That way you get just one instance of each fully duplicated record.

Turn the query into a make table query and create a new table with just one instance of each record.

This of course assumes that every field in your multiple occurrence of ProductKey, StoreKey has the same data.
 
Does using the

HAVING Count(*) > 1

line mean that the products that already have only 1 entry per store will be left out?

What is the next step after running this query, to delete the extra records so I'm left with one of each ProductKey/StoreKey combination?
 
Another approach is to create a totals query and group on every field in your table. That way you get just one instance of each fully duplicated record.

When I created a Totals query last night, it ran for 2 hours before I finally left for the night and came in this morning with the results up. I have no idea how long it actually ran. Is there a way to make it more time-effective?

Turn the query into a make table query and create a new table with just one instance of each record.

Thanks for confirming a thought that I had...just make a new table from the results of the query, verify it has everything I need it to have, delete the old table and rename this one. After creating the appropriate relationships again, of course.

This of course assumes that every field in your multiple occurrence of ProductKey, StoreKey has the same data.

The assumption is correct...every field in the multiple occurrence has the same data. I believe it was from Update queries that kept adding the same data every time it was run.

Marcie Fessler
 
You might get better performance if the key fields are indexed individually.

You did say you had a lot of records, so there's going to be a lot of processing. Maybe you could speed things up by dividing the total data into say quarters, first one makes new temporary table, other 3 append and then group on the temp table.

But if this is a one off procedure, I wouldn't care if it took overnight or longer - you don't have to sit and watch.
 

Users who are viewing this thread

Back
Top Bottom