Remove Duplicate Entries

ed333 · Mar 11, 2004

I have been searching the archives on this forum for a couple of days now, but I am unable to find a solution to my problem.

I have a table that has been imported from FoxPro which was part of a very poorly designed database. There are now 98000 records in this table, many of which are duplicate entries. Some of the records are exact duplicates, and some of them differ only in the contents of a memo field. I need a way to loop through the recordset and delete duplicate entries. If two entries are the same, except for the memo field, then I want to keep the one with the longer memo entry.

The main method that I have seen posted for removing dupes involves making a copy of the table's structure and setting the PKs so that no dupes are allowed when the data is pasted in. This will not work for me because every field in this table contains repeating values; only the comination of those values needs to be unique.

I have also tried to eliminate some of the dupes by using a SELECT DISTINCTROW query, but for some reason this returns the exact number of records that are in the original table.

Let me add that I am not talking about more than one or 2 duplicated records per "good" record, and that currently the only unique field is an autonumber field.

Is it possible to convert a find duplicates query to a delete query which will then remove any exact dupes? Is it possible to use a query to compare memo fields? Is there a better way of accomplishing this?

Thanks very much for any advice,

Ed

Kodo · Mar 11, 2004

I am assuming (I know that's a bad idea) that your records involve some sort of personal demographic information.

What I would suggest is to SELECT DISTINCT FirstName and LastName
and put those records into a new table. Then make a nother table that refers to the "username" table and import the remaining data based on the persons name or any other uniquely identifying criteria. I'm aware that people can have the same name so you may want to do it by a combination of critera or hopefully there is some sort of IDENTIFICATION number assigned to them like a social security number or something similar.

Then make a relationship between them. In any case. Get the Name of the person(s) into one table, import remaining data into another table effectively creating the one-to-many that should have been in the first place.

ed333 · Mar 11, 2004

Well, if only it were that simple. The table is actually my firm's internal docket. The fields consist of RecordNumber (autonumber field), DueDate, FileNo, ClientNo, Atty, Action, and Comment (a memo field). Often the exact same action is required, just on different dates, so the only thing unique about these records (other than the autonum) is the particular combination of their values. I tried SELECT DISTINCTROW DueDate, FileNo, ClientNo, Atty, Action, Comment INTO tblNewDocket FROM tblDocket, but the table returned was identical to the original table

Is there a problem with my SQL?

Thanks,
Ed

Kodo · Mar 11, 2004

no.. it's just that if ANY of the fields EVER have different data, then they are unique in and of themselves. That's why you got the same table.

for instance,the duedate and the comments.. well the comments are most likely almost unique..

ed333 · Mar 12, 2004

Kodo said:
no.. it's just that if ANY of the fields EVER have different data, then they are unique in and of themselves. That's why you got the same table.

for instance,the duedate and the comments.. well the comments are most likely almost unique..

Right....the ones that are unique I want to keep. There ARE several thousand complete duplicates, however. A find duplicates query shows this. I don't understand why the SELECT DISTINCTROW isn't working.

Thanks,
Ed

Kodo · Mar 12, 2004

don't use distinctrow. It requires ALL columns to be the same for it to work.

use SELECT DISTINCT and then specify the column names that you want to use for your criteria.

ed333 · Mar 12, 2004

OK, thanks. I'm at home now, but I'll try that in the morning.

Ed

ed333 · Mar 12, 2004

I tried the following SQL statement:

Code:

SELECT DISTINCT tblDocket.DUEDATE, tblDocket.CLNTNO, tblDocket.FILENO, tblDocket.ACTREQD, tblDocket.PRIORITY, tblDocket.MRKD_OFF, tblDocket.RATTY, tblDocket.COMMENT INTO d2 FROM tblDOCKET;

This produced the following error message:

"The field is too small to accept the amount of data you attempted to add. Try inserting or pasting less data."

What exactly is going on here?

Thanks,
Ed

Kodo · Mar 12, 2004

that error should only happen when you're inserting something. Probably a memo sized field to a text field. Make sure if you're inserting that you're not inserting data from a memo to a text. Check your field data-types.

ed333 · Mar 12, 2004

Well. I tried making a copy of the originall table's structure and then modifying my query to be an Append query.....the exact same error message shows up. The field "COMMENT" is a memo field, however...

Is there a way to set the destination field type from within the SQL?

Thanks,
Ed

Kodo · Mar 12, 2004

try doing this

Insert into d2 (Duedate,CLNTNO,FILENO,ACTREQD,PRIORITY,MRKD_OFF,RATTY,COMMENT)
SELECT DISTINCT tblDocket.DUEDATE, tblDocket.CLNTNO, tblDocket.FILENO, tblDocket.ACTREQD, tblDocket.PRIORITY, tblDocket.MRKD_OFF, tblDocket.RATTY, tblDocket.COMMENT FROM tblDOCKET

ed333 · Mar 12, 2004

Kodo,

That gives me the same error as posted before.

Is there a good way to accomplish this programmatically? The only method I can think of seems very inefficient, but here it is:

1. Open recordset
2. Compare all the records to the first record
3. If a match is found, delete the first record
4. Move on to the next record and compare
5. Repeat until all matches are eliminated

Is there a better way? I've got over 98000 records in this table.

Thanks,
Ed

Brianwarnock · Mar 16, 2004

hI

As this thread is old you may have solved your problem now, if not how about the following.

I’ll call your original table tbloriginal and your fiields id for the autonum and comment for the comment flda fldb etc for the others just to simplify my typing.

Ok you can create a table of duplicates when flda,fldb etc are equal call it tbldupes

Run an unmatched query tbloriginal against tbldupes creating tbluniques

Query qryclength calculates the length of the comment fild in an additional fld clength:Len([coment])

qrygroupdupes runs against qryclength to create output MaxOfclength flda fldb etc Groupby these

Appendquery qryappendupwithmaxcomment takes in qryclength and qrygroupdupes
Joins on clength maxofclength and fla flb etc to append the duplicate with the max length to tbluniques

Hope the above makes sense

Brian

ed333 · Mar 16, 2004

O.K. I've worked up some code, but my inexperience with VB is frustrating me. Can anyone tell me why I am getting error 13, "type mismatch" on the line that begins "Set rs = ...."?

Here's the code:

Code:

'Find dupes in tblDocket, remove dupes with shortest comment length

Public Sub DelDupes()
    'Initialize Recordset
    Dim rs As Recordset
    Dim db As Database
    Dim RecNum, pointer As Long
    Dim DueDate As Date
    Dim Client, File, Act, Pri, Mark, Atty As String
    Dim Comment As Integer
        
    Set db = CurrentDb
    Set rs = db.OpenRecordset("tblDOCKET", dbOpenTable)
    rs.Index = "RecordNumber"
    rs.MoveFirst
    Do While Not rs.EOF
        ' Set initial conditions to be compared
        RecNum = rs!RecordNumber
        DueDate = rs!DueDate
        Client = rs!CLIENTNO
        File = rs!FILENO
        Act = rs!ACTREQD
        Pri = rs!PRIORITY
        Mark = rs!MRKD_OFF
        Atty = rs!RATTY
        Comment = Len(rs!Comment)
        
        Do While Not rs.EOF
            ' Loop through until a dupe is found, and then go to original and delete it
            If rs!DueDate = DueDate And rs!CLIENTNO = Client And rs!FILENO = File And rs!ACTREQD = Act And rs!PRIORITY = Pri And rs!MRKD_OFF = Mark And rs!RATTY = Atty And Comment <= Len(rs!Comment) Then
                rs.Seek "=", RecNum
                rs.Delete
                rs.MoveNext
            Else
                rs.MoveNext
            End If
        Loop
        ' Then move to the next record
        rs.MoveNext
    Loop
End Sub

Thanks,

Ed

ed333 · Mar 16, 2004

It turned out to be a reference error. I've got the code running right now on a test copy of the data. It's slow. Very slow. However, after a bit of tweaking, it seems to be doing the job!

Thanks for all your help, everyone.

Ed

Remove Duplicate Entries

ed333

Hopelessly Confused

Kodo

"The Shoe"

ed333

Hopelessly Confused

Kodo

"The Shoe"

ed333

Hopelessly Confused

Kodo

"The Shoe"

ed333

Hopelessly Confused

ed333

Hopelessly Confused

Kodo

"The Shoe"

ed333

Hopelessly Confused

Kodo

"The Shoe"

ed333

Hopelessly Confused

Brianwarnock

Retired

ed333

Hopelessly Confused

ed333

Hopelessly Confused

Similar threads

Users who are viewing this thread