Find Duplicates different than Duplicates deleted

poet_will · Dec 8, 2005

Hey, all! Thanks for helping, here is my situation.

I have a table with about 70,000 records that have duplicate Address field values. The rest of the field values for those records are different. When I do a find duplicate querry I get the result that 17,000 records have the same address. However, when I do the append qurrey as instructed here: http://support.microsoft.com/?kbid=209183 I get a total of only 600 records in the new table. I have tried deleting all of the indexes for both the new and old table, with no luck.

I'm using Access 2000 on XP Pro.

If anyone could help with this I would greatly appreciate it!

Thanks

Will

ScottGem · Dec 8, 2005

Could it be that there are 600 unique addresses? But that 17000 records use one of those 600?

Lets look at it this way. You have a table of names. There are 10 records. All the last names are different, but there are 3 Toms, 4 Dicks, 2 Harrys and 1 Jane. So you have 4 unique first names, but 9 records that have duplicate first names.

poet_will · Dec 8, 2005

Nope. There are records missing that had no duplicates in the orginal table and some that had duplicates. They should be in the new table, but are not. When you do the find duplicate querry you are basically seeing all the records that have a duplicate in the address field. Therefore, there are 17000 unique addresses that are duplicated in my table. However, only 600 of those addresses show up in my new table. That is why it is strange, and I dont' understand it.

Will

ScottGem · Dec 8, 2005

Without seeing the queries that are giving you those numbers I can't tell what's happening. Can you post the SQL?

poet_will · Dec 9, 2005

Well the find duplicates querry is the one from the "New Querry" menu. Here is the SQL for that:

SELECT First([ADDRESS]) AS [ADDRESS Field], Count([ADDRESS]) AS NumberOfDups
FROM AllSourcesfromCDM
GROUP BY [ADDRESS]
HAVING Count([ADDRESS])>1;

As for the append querry it is from the Microsoft website: http://support.microsoft.com/?kbid=209183. Here is the SQL for that:

INSERT INTO AllSourcesfromCDM2
SELECT AllSourcesfromCDM.*
FROM AllSourcesfromCDM;

Any ideas?

Thanks

Will

ScottGem · Dec 9, 2005

In the first query, you are getting a count, by address, of all records that that have more than one of the same address. You are using AllSourcesfromCDM as your source. The number of records returned by this query would be the number of unique addresses that are duplicated within that table. If you summed the NumberofDups column you would get the total number of records (a larger number) that have dups. So what is the recordcount and the sum? What is the name of this query?

The second query is apending to another table (AllSourcesfromCDM2) all records in AllSourcesfromCDM. This isn't filtering anything. This should create a duplicate of AllSourcesfromCDM. If your goal is create a table with just the unique addresses, this won't do it. You should be doing a SELECT DISTINCT of just the address from the first query.

poet_will · Dec 9, 2005

Ok, the recordcount is 17196 and I dont know how to figure the sum of all of those (and of course i'm not doing it by hand).

For the second querry I'm doing this:

How to delete duplicate records from a table in Access 2000
View products that this article applies to.
Article ID : 209183
Last Review : July 29, 2004
Revision : 1.0
This article was previously published under Q209183
For a Microsoft Access 97 version of this article, see 109329 (http://support.microsoft.com/kb/109329/).
Novice: Requires knowledge of the user interface on single-user computers.

SUMMARY
This article describes how you can remove duplicate records from a table by using primary keys. By using this method, you can check for duplicate values in up to 10 fields in the table.
MORE INFORMATION
To remove duplicate records from a table, follow these steps:
1. Make a copy of the structure of the table from which you want to remove the duplicate records.

To copy a table:
a. Select the table in the Database window
b. On the Edit menu, click Copy.
c. On the Edit menu, click Paste.
d. Enter a name for the new table.
e. Select Structure Only
f. Click OK.
2. Open the new table in Design view.
3. Select the field(s) that contain the duplicate values.
4. To make your selection the primary key for the table, click the Primary Key button on the toolbar.
5. Save and close the table.
6. Create an append query based on the original table containing duplicates.
7. In the query Design View, on the Query menu, click Append Query.
8. In the Append dialog box, click the name of the new table from the Table Name list, and then click OK.
9. Include all the fields from the original table by dragging the asterick (*) to the query design grid.
10. On the Query menu, click Run.
11. Click Yes in the dialog box advising you that you are about to append records.
12. Because the Primary Key field(s) in the new table will not accept duplicate values, the following error message will be displayed:
Microsoft Access can't append all the records in the append query.

Microsoft Access set 0 field(s) to Null due to a type conversion failure, and it didn't add <number> record(s) to the table due to key violations, 0 record(s) due to lock violations, and 0 record(s) due to validation rule violations.
Do you want to run the action query anyway?
To ignore the error(s) and run the query, click Yes. For an explanation of the causes of the violations, click Help.
13. Click Yes.
14. View the contents of the new table. When you're sure the new table has the correct unique records, you can delete the original table, and then rename the new table using the name of the original table.
REFERENCES
For more information about deleting duplicate records, click Microsoft Access Help on the Help menu, type Automatically delete duplicate records from a table. in the Office Assistant or the Answer Wizard, and then click Search to view the topic.

This is the text from the website that I linked to in a previous post. Notice that in the new table I coppied only the structure from the original table and made the "Address" field primary (therefore no duplicates, correct?) so when I wrote all of the records from the first table to the new table it takes out the duplicates (correct again?) or does it take ALL records that have a duplicate. I.E. if two records had the same address it would take them both out. This would answer why I only get about 600 records in the new table.

So how do i delete duplicates without deleting ALL of the duplicated records?

Thanks for all your help!!

Will

ScottGem · Dec 9, 2005

First, you didn't answer all my questions. To get the sum of the counts you could use run a DSUM in the immediate window. But that's not that important. The name of the first query is.

There is no way to delete only the duplicates. What the article is telling you to do is create a new table and set the field you want to be unique as a PK or simply set the Index to no Duplicates. When you Append the data from your original table to that new table. Only unique instances of the one value will be appended. All other records will be left out. You should be getting at least 17196 records in the new table (probably more since there may have been addresses that weren't dupped). If you aren't they something you are telling me isn't correct. What I would suggest you do is post a copy of your database with just the tables and queries in question.

poet_will · Dec 9, 2005

Thanks for the help youa re giving me Scott. I'm a very basic user of Acces, so I don't know how to do a DSUM. Could you explain how to do that?

Secondly, I can't post my DB because it has confidential data in it. Perhaps I could just post the Address column for you to look out. But, I would have to ask my boss if it is ok to do that.

From the best of my knowledge I have told you everything about the problem that I can think of. I did a duplciates querry from the "new" menu in the querry window and come up with the 17000 records that have duplicates in the Address field. I then followed the instructions from that website to the letter and come up with the wrong number in the new table. I don't know what else to tell ya.

Thanks

Will

ScottGem · Dec 12, 2005

First, just look up DSUM in Access Help. Second, you could try removing name or other identification info from the table. But without being able to work with the data I'm not sure how much I can help further.

You still haven't given me the name of the first query which I requested twice.

Maybe you can post a screen shot of the copied table with the Address field selected. That might help, I suspect the problem is in how that strucutre only copy is configured.

poet_will · Dec 12, 2005

There isn't a name for the find duplicate querry. If you click on querry on the left menu then click "New" you get a list of 5 querries that you can make: Design View, Simple Querry Wizard, Crosstab Querry Wizard, Find Duplciates Querry Wizard, and Find Unmatched Querry Wizard. I chose "Find Duplicates Querry Wizard", picked "address" for the field that I want to find the duplicates in and hit finish. That is where i got the 17000+ duplicates from.

UPDATE: Ok, I coppied the table to another database and deleted all of the fields except Address and ObjectID and then did the delete duplicates process and it worked. So, there must be a field that is messing the process up. I will go through the table field by field to find out which one is doing it.

Thanks

Will

ScottGem · Dec 12, 2005

First, when you use the find Duplicates Wizard it automatically saves the query and assigns it a name. Usually "Find Duplicates for..." and the field name.

Second, it sounds like you have set no duplicates on more than one field. Check the properties of each field in the copied table.

poet_will · Dec 12, 2005

The name for the duplicate querry is "Find duplicates for AllSourcesfromCDM".

I made sure the I removed indexing from all of the fields except the "Address" field and went through the procedure again and came up with the same number of records in the new table.

When i do the append queery this si what it says:
"Microsoft Access set 0 field(s) to Null due to a type of conversion failure, and didn't add 2080 record(s) to the table due to key violations, 0 records due to lock violations, and 73968 record(s) due to validation rule violations."

What can the validation rule violations mean? I shouldn't be getting validation rule violations because I coppied the table structure, correct? Or is this because the Address field is now the primary key? I dont' understand how it can work with just the "address" field and not with all of the fields.

Thanks

Will

ScottGem · Dec 12, 2005

Could be a data type mismatch or conversion of some sort

bentnail · Jun 9, 2006

I know this thread is old, but perhaps the following will help someone else with a similar problem.

Not knowing all the details of poet_wills destination table and query, I can't help but suspect that his problem is caused by having several records containing fields with a zero length string value in his source table. By default Access tables set text fields to disallow "Zero Length" fields. In the table's design view, one can change the value of the "Allow Zero Length" criteria in the applicable text field from "No" to "Yes" to solve the problem.

Find Duplicates different than Duplicates deleted

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

poet_will

Registered User.

ScottGem

Registered User.

bentnail

New member

Similar threads

Users who are viewing this thread