delete 4 out of 5 records where name field and address field repeat more than 5x

johnmerlino · Oct 21, 2010

The table did start with unique ids. However, I had to create a new table to bring a column from one table with a column from another table. This query caused duplicate ids:

Code:

SELECT possibles.fullname AS fullname, PrepareForDuplicateCheck.addresses, PrepareForDuplicateCheck.id INTO final_output
FROM possibles, PrepareForDuplicateCheck
WHERE (INSTR(PrepareForDuplicateCheck.names_1, possibles.fullname) > 0)
Or
(INSTR(PrepareForDuplicateCheck.names_2, possibles.fullname) > 0);

Rather than bring over this "PrepareForDuplicateCheck.id" from the other table, is there a way to create unique id as in your alter table statement? Keep in mind what's above is a select statement.

johnmerlino · Oct 21, 2010

I try this:

Code:

SELECT KeepThese.* INTO ready_for_print
FROM KeepThese
WHERE ID <>  (SELECT Min(ID) AS MinOfID FROM KeepThese AS Dupe      
WHERE (Dupe.id  = KeepThese.id));And it appends nothing in new table.

And it appends nothing in new table.

johnmerlino · Oct 21, 2010

By the way, I appreciate all your responses. It has been invaluable as I leap into the world that is Access.

johnmerlino · Oct 21, 2010

Perhaps I should clear up the issue. I have this:
333 Marge, John State Road
333 Marge, John State Road
444 Marge, John Cap Road
444 Marge, John Cap Road
515 Marge, John 3rd Street

I would want to keep only the following:
333 Marge, John State Road
444 Marge, John Cap Road
515 Marge, John 3rd Street

jal · Oct 21, 2010

Now I'm even more confused. Earlier you said
(1) If there are five dups, I only want to keep one of the five records.
(2) If there are four dups (or less), keep all the records in that dup set.

NOW you are asking me to eliminate ALL dups ???

johnmerlino said:
Perhaps I should clear up the issue. I have this:
333 Marge, John State Road
333 Marge, John State Road
444 Marge, John Cap Road
444 Marge, John Cap Road
515 Marge, John 3rd Street

I would want to keep only the following:
333 Marge, John State Road
444 Marge, John Cap Road
515 Marge, John 3rd Street

johnmerlino · Oct 22, 2010

No, I think I didn't explain well. Here's the issue. If a person let's say John Smith has more than 5 different addresses, then that's too many addresses to keep him in my original list. So I want to put him in a separate table called extras. For example:

Code:

John Smith First Street
John Smith Second Street
John Smith Third Street
John Smith Fourth Street
John Smith Fifth Street
John Smith Sixth Street

What I mentioned above about deleting all duplicates can be ignored. That was only being caused by another variable that I completely removed.

But going back to what I stated originally, I misrepresented. I don't want to delete if there's more than 5 of the same, I want to remove from one table and place in another if there is more than 5 that are different addresses. I know in my initial example, I gave an example of duplicate addresses, but that was a mistake. There are absolutely no duplicates at all here. It's just 6 different addresses is too much to keep for the one name. I hope that clears up confusion.

jal · Oct 22, 2010

johnmerlino said:
The table did start with unique ids. However, I had to create a new table to bring a column from one table with a column from another table. This query caused duplicate ids:

Code:

SELECT possibles.fullname AS fullname, PrepareForDuplicateCheck.addresses, PrepareForDuplicateCheck.id INTO final_output FROM possibles, PrepareForDuplicateCheck WHERE (INSTR(PrepareForDuplicateCheck.names_1, possibles.fullname) > 0) Or (INSTR(PrepareForDuplicateCheck.names_2, possibles.fullname) > 0);

Rather than bring over this "PrepareForDuplicateCheck.id" from the other table, is there a way to create unique id as in your alter table statement? Keep in mind what's above is a select statement.

If the above query is producing dups, try rewriting it like this:

SELECT DISTINCT fullName, addresses, ID
INTO final_output
FROM
(
SELECT possibles.fullname AS fullname, PrepareForDuplicateCheck.addresses, PrepareForDuplicateCheck.id
FROM possibles, PrepareForDuplicateCheck
WHERE INSTR(PrepareForDuplicateCheck.names_1, possibles.fullname) > 0
Or INSTR(PrepareForDuplicateCheck.names_2, possibles.fullname) > 0
)

jal · Oct 22, 2010

johnmerlino said:
No, I think I didn't explain well. Here's the issue. If a person let's say John Smith has more than 5 different addresses, then that's too many addresses to keep him in my original list. So I want to put him in a separate table called extras. For example:

Code:

John Smith First Street John Smith Second Street John Smith Third Street John Smith Fourth Street John Smith Fifth Street John Smith Sixth Street

What I mentioned above about deleting all duplicates can be ignored. That was only being caused by another variable that I completely removed.

But going back to what I stated originally, I misrepresented. I don't want to delete if there's more than 5 of the same, I want to remove from one table and place in another if there is more than 5 that are different addresses. I know in my initial example, I gave an example of duplicate addresses, but that was a mistake. There are absolutely no duplicates at all here. It's just 6 different addresses is too much to keep for the one name. I hope that clears up confusion.

There's no SQL command for "moving" a record from one table to another. Instead you have to (Method 1).
(1) Copy the record to the second table.
(2) Delete the original record.

Alternatively you can (Method II)
(1) Copy the record to the second table.
(2) Copy all remaining records to a third table.
(3) Delete (or henceforth ignore) the original table.

So far you seem to be more of a Method II person.

jal · Oct 22, 2010

Code:

Going with Method II, it's going to take at least 2 queries.  I'm going to ignore addresses because you now say that only the names are repeating.  We'll create 2 tables, KeepThese1 and KeepThese2. First query copies one record into KeepThese1 if the customer has 6 or more addresses. 

SELECT F.*
INTO KeepThese1
FROM [The Final Query] as F
INNER JOIN
(
SELECT Min(F.ID) AS ID
FROM [The Final Query] as F
GROUP BY names_1
HAVING COUNT(*) >=6
) as IDsToKeep
ON IDsToKeep.ID = F.ID


Second query copies into KeepThese2 all remaining records, that is, any records NOT copied into KeepThese1 by the first query . 

SELECT F.*
INTO KeepThese2
FROM [The Final Query] as F
INNER JOIN
(
    SELECT F.ID
    FROM [The Final Query] as F
    LEFT JOIN
    (
        SELECT Min(F.ID) AS ID
        FROM [The Final Query] as F
        GROUP BY names_1
        HAVING COUNT(*) >=6
    ) as IDsToReject
    ON IDsToReject.ID = F.ID
    WHERE IDsToReject.ID IS NULL
) as IDsToKeep
ON IDsToKeep.ID = F.ID

johnmerlino · Oct 24, 2010

jal said:
If the above query is producing dups, try rewriting it like this:

SELECT DISTINCT fullName, addresses, ID
INTO final_output
FROM
(
SELECT possibles.fullname AS fullname, PrepareForDuplicateCheck.addresses, PrepareForDuplicateCheck.id
FROM possibles, PrepareForDuplicateCheck
WHERE INSTR(PrepareForDuplicateCheck.names_1, possibles.fullname) > 0
Or INSTR(PrepareForDuplicateCheck.names_2, possibles.fullname) > 0
)

Thanks for response. Now you have "SELECT DISTINCT fullName". However, I don't necessarily want to keep distinct names if the number of different addresses is less than five. Hence, if a person has four addresses, then there will be duplicate names, but since they only have 4 different addresses and not more than 5, I want to keep their name, even if it's repeated. When I remove the ID from SELECT DISTINCT, it performs what I wanted, but of course now there's no id column in newly created table. So I needed to create new query: ALTER TABLE final_output ADD COLUMN Id AUTOINCREMENT;

johnmerlino · Oct 24, 2010

jal said:

Code:

Going with Method II, it's going to take at least 2 queries.  I'm going to ignore addresses because you now say that only the names are repeating.  We'll create 2 tables, KeepThese1 and KeepThese2. First query copies one record into KeepThese1 if the customer has 6 or more addresses. 
 
SELECT F.*
INTO KeepThese1
FROM [The Final Query] as F
INNER JOIN
(
SELECT Min(F.ID) AS ID
FROM [The Final Query] as F
GROUP BY names_1
HAVING COUNT(*) >=6
) as IDsToKeep
ON IDsToKeep.ID = F.ID
 
 
Second query copies into KeepThese2 all remaining records, that is, any records NOT copied into KeepThese1 by the first query . 
 
SELECT F.*
INTO KeepThese2
FROM [The Final Query] as F
INNER JOIN
(
    SELECT F.ID
    FROM [The Final Query] as F
    LEFT JOIN
    (
        SELECT Min(F.ID) AS ID
        FROM [The Final Query] as F
        GROUP BY names_1
        HAVING COUNT(*) >=6
    ) as IDsToReject
    ON IDsToReject.ID = F.ID
    WHERE IDsToReject.ID IS NULL
) as IDsToKeep
ON IDsToKeep.ID = F.ID

Thanks for response. The first query worked and put all the repeated references more than 6 into their own table. However, the second query did not work. It didn't put all the repeated name references less than 6 into their own table.

I have records that look like this that are still left over:
id fullName address
1 Johnson Margaret 1st Street
2 Johnson Margaret 2st Street
3 Johnson Margaret 3st Street
4 Johnson Margaret 4st Street
5 Johnson Margaret 5st Street
6 Johnson Margaret 6st Street

I did change some names around due to differences in your names and mine but it still should have worked if the sql was correct:

Code:

SELECT F.* INTO KeepThese
FROM final_output AS F INNER JOIN (SELECT F.ID FROM final_output AS F LEFT JOIN (SELECT Min(F.ID) AS ID FROM final_output AS F GROUP BY fullName HAVING COUNT(*) >=6)  AS IDsToReject ON IDsToReject.ID = F.ID WHERE IDsToReject.ID IS NULL)  AS IDsToKeep ON IDsToKeep.ID = F.ID;

jal · Oct 25, 2010

As far as I can see, the SQL I wrote is just fine. The problem is that you keep changing your request. Here's what you said last time:

I want to remove from one table and place in another if there is more than 5 that are different addresses.

That's precisely what I did, as far as I can see. Now you're changing things again - this time it's "more than 6".

The first query worked and put all the repeated references more than 6 into their own table. However, the second query did not work. It didn't put all the repeated name references less than 6 into their own table.

Make up your mind.

johnmerlino · Oct 25, 2010

What I'm ultimately trying to do is if the names repeated is more than 6, they get put in their own column. If they are less than 6 repeated names, they get put in another column, but the names more than 6 don't get put in this table.

I can visually show better than I can explain:
extras table:

Code:

jen baker 1 road
jen baker 2 road
jen baker 3 road
jen baker 4 road
jen baker 5 road
jen baker 6 road

KeepThese table:

Code:

harry baker 1 street
harry baker 2 street
harry baker 3 street
harry baker 4 street
donald ross 11th street

Harry and Donald get put in KeepThese table because he has less than 6 different addresses. Jen gets put in extras table because she has 6 or more different addresses.

jal · Oct 25, 2010

You want all 6 of Jen in one table, then, right? KeepThese1

And you want all the Harrys and Donalds in another table, right? KeepThese2

Or do I still have it wrong?

johnmerlino · Oct 25, 2010

Yeah, the reason why is because Harry and Donald have 5 or less different addresses, whereas jen has 6 or more different addresses. Because of this, Harry and Donald go in one table and Jen goes in another, along with anyone else who has 6 or more different addresses. Thanks for all your responses.

jal · Oct 25, 2010

Unless I'm not thinking clearly, I suppose all we need to do, then, is to remove the MIN(ID) restriction (change it to SELECT ID). First query:

SELECT F.*
INTO KeepThese1
FROM [The Final Query] as F
INNER JOIN
(
SELECT ID
FROM [The Final Query] as F
GROUP BY names_1
HAVING COUNT(*) >=6
) as IDsToKeep
ON IDsToKeep.ID = F.ID

2nd query

SELECT F.*
INTO KeepThese2
FROM [The Final Query] as F
INNER JOIN
(
SELECT F.ID
FROM [The Final Query] as F
LEFT JOIN
(
SELECT ID
FROM [The Final Query] as F
GROUP BY names_1
HAVING COUNT(*) >=6
) as IDsToReject
ON IDsToReject.ID = F.ID
WHERE IDsToReject.ID IS NULL
) as IDsToKeep
ON IDsToKeep.ID = F.ID

johnmerlino · Oct 26, 2010

jal said:
Unless I'm not thinking clearly, I suppose all we need to do, then, is to remove the MIN(ID) restriction (change it to SELECT ID). First query:

SELECT F.*
INTO KeepThese1
FROM [The Final Query] as F
INNER JOIN
(
SELECT ID
FROM [The Final Query] as F
GROUP BY names_1
HAVING COUNT(*) >=6
) as IDsToKeep
ON IDsToKeep.ID = F.ID

2nd query

Thanks for reply. Unfortunately, when I try to run the query, specifically the second one, because for the first one, the one you posted earlier worked. But when I try to run the second one to pull all records that have less than 6 different addresses in a new table, I get the following error:
"You tried to execute a query that does not include the specified expression "ID" as part of an aggregate function."

jal · Oct 26, 2010

johnmerlino said:
Thanks for reply. Unfortunately, when I try to run the query, specifically the second one, because for the first one, the one you posted earlier worked. But when I try to run the second one to pull all records that have less than 6 different addresses in a new table, I get the following error:
"You tried to execute a query that does not include the specified expression "ID" as part of an aggregate function."

Yeah, that was a typical dumb oversight on my part. I'll have to fix that.

jal · Oct 26, 2010

It shouldn't be SELECT ID - it should be SELECT names_1. Sorry. Try this:

Code:

SELECT F.*
INTO KeepThese1
FROM [The Final Query] as F
INNER JOIN
(
    SELECT names_1
    FROM [The Final Query] as F
    GROUP BY names_1
    HAVING COUNT(*) >=6
) as NamesToKeep
ON NamesToKeep.names_1 = F.names_1

second query:

Code:

SELECT F.*
INTO KeepThese2
FROM [The Final Query] as F
INNER JOIN
(
    SELECT F.names_1
    FROM [The Final Query] as F
    LEFT JOIN
    (
        SELECT names_1
        FROM [The Final Query] as F
        GROUP BY names_1
        HAVING COUNT(*) >=6
    ) as NamesToReject
    ON NamesToReject.names_1 = F.names_1
    WHERE  NamesToReject.ID IS NULL
) as NamesToKeep
ON NamesToKeep.names_1 = F.names_1

I'm falling asleep at my desk, can't think straight, - I hope i got it right. I hope I don't produce any dups, maybe it should have been SELECT DISTINCT names_1 to avoid dups....can't think right now.

johnmerlino · Oct 27, 2010

jal said:

second query:

Code:

SELECT F.*
INTO KeepThese2
FROM [The Final Query] as F
INNER JOIN
(
    SELECT F.names_1
    FROM [The Final Query] as F
    LEFT JOIN
    (
        SELECT names_1
        FROM [The Final Query] as F
        GROUP BY names_1
        HAVING COUNT(*) >=6
    ) as NamesToReject
    ON NamesToReject.names_1 = F.names_1
    WHERE  NamesToReject.ID IS NULL
) as NamesToKeep
ON NamesToKeep.names_1 = F.names_1

Thanks for reply. When I run second query, I get a popup that says this:
Enter Parameter Value
NamesToReject.ID

delete 4 out of 5 records where name field and address field repeat more than 5x

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Registered User.

Similar threads

Users who are viewing this thread