delete 4 out of 5 records where name field and address field repeat more than 5x

johnmerlino · Oct 15, 2010

Hey all,

I am trying to delete 4 out of 5 records from table where name field repeats same value more than 5 times and the address field repeats more than five times for a table. So if there are 5 records with a name field and address field that are the same for all 5, then I would like to delete 4 out of 5. An example:
id name address
1 john 6440
2 john 6440
3 john 6440
4 john 6440
5 john 6440

I would only want to return 1 record from the 5 records above.

Thanks for response.

johnmerlino · Oct 15, 2010

This gives syntax error message:

Code:

CREATE TABLE isolate_duplicates AS 
    SELECT DISTINCT names_1, addresses, COUNT(*) 
    FROM print_ready GROUP BY names_1, addresses 
    HAVING COUNT(*) > 4;

DELETE print_ready FROM print_ready INNER JOIN isolate_duplicates 
ON print_ready.names_1 = isolate_duplicates.names_1 AND 
   print_ready.addresses = isolate_duplicates.addresses;

INSERT INTO print_ready SELECT names_1, addresses from isolate_duplicates;

I'm also not 100 percent sure this is going to do what I intend because I not only want the single record out of the five duplicates, but I also want to retain the records if they didn't have 5 duplicates to begin with.

jal · Oct 15, 2010

A standard query for deleting dups is this:

Delete from table1 where ID NOT IN
(
select min(id) from table1
Group by Name, Address
HAVING Count(*) > 1

)

Assuming it's the Name, Address combo that constitutes a dup. There is a faster way if you have, say, over 10,000 records in the table, but it involves a temp table.

johnmerlino · Oct 15, 2010

That gives me error "could not delete from specified table"

jal · Oct 15, 2010

johnmerlino said:
That gives me error "could not delete from specified table"

Actually I'm glad you got an error because I think I was supposed to omit the Having clause. I'll try it on my own and report back.

jal · Oct 15, 2010

Yes, I was supposed to omit the HAVING clause (otherwise it deletes the singular rows as well as those with dups). But I don't know why you are getting that error -it works find for me. Maybe your table is bound to a form such that the form has a lock on the table?

To find out, try running the query in SQL view (with all forms closed).

johnmerlino · Oct 16, 2010

I do want to keep duplicates. I want to delete records where the duplicates exceed 4. So if there are five names and addresses that are the same, then I want to delete 4 of the 5. However, if there's 4, then I want to keep all 4.

jal · Oct 16, 2010

However, if there's 4, then I want to keep all 4.

Ok, that's a strange request. You'll need something more sophisticated than what I gave you.

jal · Oct 16, 2010

Create a temp table call KeepThese with column ID as primary key (assuming these values are unique in the original table). These are the ID numbers you wish to keep. You MUST designate ID as the primary key in KeepThese (otherwise the DELETE query won't work). The following is untested, however.

INSERT INTO KeepThese
SELECT ID FROM
(
SELECT Min(ID) AS ID
FROM Print_Ready
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM Print_Ready as P
INNER JOIN
(SELECT Names_1, addresses
FROM Print_ready
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)

Then use KeepThese to delete the other records:

DELETE P.* FROM Print_Ready as P
LEFT JOIN KeepThese as K
ON K.ID = P.ID
WHERE K.ID IS NULL

jal · Oct 16, 2010

Later I was thinking you might need to add the word ID to the first line:

INSERT INTO KeepThese (ID)

johnmerlino · Oct 17, 2010

Thanks for response but I'm still having problems with this.

1) I create a table called KeepThese and give it a primary key id.
2) I create a query called delete_1 and copy this into it:
INSERT INTO KeepThese
SELECT ID FROM
(
SELECT Min(ID) AS ID
FROM Print_Ready
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM Print_Ready as P
INNER JOIN
(SELECT Names_1, addresses
FROM Print_ready
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)INSERT INTO KeepThese
SELECT ID FROM
(
SELECT Min(ID) AS ID
FROM Print_Ready
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM Print_Ready as P
INNER JOIN
(SELECT Names_1, addresses
FROM Print_ready
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)
3) I create a query called delete_2 and copy this into it:
DELETE P.* FROM Print_Ready as P
LEFT JOIN KeepThese as K
ON K.ID = P.ID
WHERE K.ID IS NULL
4) Then I run delete_1. I get a message that says "circular reference caused by alias ID" So I change this piece:
FROM (SELECT Min(ID) AS ID
to say this:
FROM (SELECT Min(ID) AS ID2
Then I double click again and a popup displays saying Enter Parameter Value for ID.This indicates that it doesn't know what ID is. But print_ready is only a query and while it has an id, it is in reality the id of another table that got filtered into this query.

Not sure what to do at this point.

Thanks for response.

jal · Oct 18, 2010

I know what's causing the circular reference. Try this (see changes in red). But I still haven't tested it so there may be more errors.
INSERT INTO KeepThese (ID)
SELECT ID FROM
(
SELECT Min(P.ID) AS ID
FROM Print_Ready as P
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM Print_Ready as P
INNER JOIN
(SELECT Names_1, addresses
FROM Print_ready
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)

johnmerlino · Oct 20, 2010

Thanks for reply. This didn't seem to insert anything in the KeepThese field:

Code:

INSERT INTO KeepThese ( ID )
SELECT ID
FROM (SELECT Min(P.ID) AS ID
FROM [The Final Query] as P
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM [The Final Query] as P
INNER JOIN
(SELECT names_1, addresses
FROM [The Final Query]
GROUP BY names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)  AS [%$##@_Alias];

Perhaps I'm overlooking something? What it did was just change the query that I put this query in and basically just left it with an id column with id values.

jal · Oct 20, 2010

The code I gave you works fine. Attached is a sample. When you run the code (qryFindDups) it should insert two records into KeepThese. It does.

johnmerlino · Oct 20, 2010

Thanks for response. I noticed it only outputs the id into the new table but not the corresponding name and address. Is there a way to output the corresponding name and address with the id into the new table? I tried changing it myself but I get error like "The specified field names_1 could refer to more than one table listed in the FROM clause of your SQL statement" with this:

Code:

INSERT INTO KeepThese ( ID )
SELECT ID 
FROM (SELECT Min(P.ID) AS ID 
FROM [The Final Query] as P
GROUP BY names_1, addresses
HAVING COUNT(*) >=5
UNION ALL
SELECT ID, names_1, addresses FROM [The Final Query] as P
INNER JOIN
(SELECT Names_1, addresses
FROM [The Final Query]
GROUP BY names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)  AS [%$##@_Alias];

jal · Oct 20, 2010

I gave you two queries. The second query was for deleting the records you wish to eliminate. I guess that approach isn't what you want. I can do what you ask, maybe later on tonight.

johnmerlino · Oct 20, 2010

Thanks for response. Any idea why it says "The specified field names_1 could refer to more than one table listed in the FROM clause of your SQL statement" when I added "names_1" and "addresses" to the line directly below UNION ALL. While I understand what UNION ALL is supposed to do, this part here "FROM (SELECT Min(P.ID) AS IDFROM [The Final Query] as P" confuses me. You have to FROM statements there, which I don't understand. Maybe it's a level deeper than my knowledge but I'm curious as to why two are there.

jal · Oct 20, 2010

I tested this (on the sample already uploaded) so it should work ok.

SELECT F.*
INTO KeepThese
FROM [The Final Query] as F
INNER JOIN
(
SELECT Min(F.ID) AS ID
FROM [The Final Query] as F
GROUP BY names_1, addresses
HAVING COUNT(*) >=5

UNION ALL

SELECT ID FROM [The Final Query] as F
INNER JOIN
(SELECT Names_1, addresses
FROM [The Final Query]
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = F.Names_1
AND ThoseLessThan5.addresses =F.addresses
) as IDs
ON IDs.ID = F.ID

johnmerlino · Oct 21, 2010

Thanks for response. That worked but there's a problem that I have duplicate ids. Is there a simple query to remove duplicate ids - where you just check if the id has dupluicates and nothing else?

I tried this:

Code:

SELECT KeepThese.* INTO ready_for_print
FROM KeepThese
WHERE id IN (SELECT id
                   FROM KeepThese
                   GROUP BY fullname, addresses
                   HAVING COUNT(*) < 2);

But this gives me error:
"You tried to execute a query that does not include the specified expression 'id' as part of an aggregate function.

jal · Oct 21, 2010

Every table should have a primary key, meaning a unique ID distinguishing each row from the other rows. Therefore I'm not sure why you have duplicate IDs. Sounds like you did not set up your tables correctly. You should run the following code on your tables to fix this problem.

ALTER TABLE table1 ADD COLUMN AutoNum COUNTER

This will add a column called "AutoNum" to a table such as Table1 and helps to insure that each row has a unique ID (it's also a good idea to put an index on this column with 'No duplicates' as to guarantee uniqueness). Make sure that [The Final Query] pulls the AutoNum column from its source table. Then use the same code I gave you, but substitute "AutoNum" in all those places where I used the "ID" column in the code. Or if you prefer, delete the ID column (as to get rid of the duplicate IDs) and then use the "AutoNum" column as your new ID column (you can change the column name to ID).

delete 4 out of 5 records where name field and address field repeat more than 5x

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Attachments

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Similar threads

Users who are viewing this thread