Deleting Duplicates

paulreed · Feb 7, 2004

I have a table, linked to another application whose format I cannot modify, which contains 5 fields containing data. Occasionally, data is recorded which is identical in fields 2, 3 & 4 (not bothered about 1 & 5).
I would like to create a process where the duplicates are deleted, whilst leaving 1 of the entries intact (doesn't matter which one).

I have searched the forum, and this has been asked a number of times - but the replies have been messy - such as creating another table & deleting the original, but surely there must be a better solution as this will need to be automated, such as through a macro etc. to facilitate user's abilities.

- easy for me to say!

WayneRyan · Feb 7, 2004

Paul,

You don't specify by what means you receive the data. In
order to know how to handle it, one must know where it
comes from and in what context you receive it.

Need more info ...

Wayne

paulreed · Feb 8, 2004

Wayne

We have a visual basic application which stores its data in a mdb file on a server, the mdb file has 5 tables within.
As an administrator, I access that data file via an admin mdb file, which has 5 tables 'linked' to the data file above, so in effect, I am working directly on the data held on the server and can delete or amend entries accordingly.
The table where I wish to remove duplicate entries, is not in a direct relationship with the other 4 tables, therefore deleting data is not a problem.
My first attempt was to create a 'find duplicates' query, which worked fine, but I then had to manually go through the query and delete the entries, ensuring that I leave at least one of them intact, which is time consuming (and leads to errors!)

Hope you can assist

Lanason · Feb 8, 2004

I have the same probelm

Yes - I have the same problem and used the same approach I.e. duplicates query and manual delete. To make it worse I have multiple duplicates - reason being it is based on a download from another source where I cannot control the data.

I tried to number the records so I could say keep all the number 1's deleteing > than 1, but could get this numbering to work. Thus I'm interestred to know the answer as well.

Maybe thats a solution - put a field in the curent table with say "1", import the data and then using a duplicates delete all those without the 1. Having done that all new records that arnt duplicate need the 1 added. Might even be a better idea to do it the other way round. sound possible ???

paulreed · Feb 8, 2004

Unfortunatley, Adrian. I cannot add any fields to the table, or change any of it's characteristics, just analyse, amend or delete the entries. I can of course add a query, macro, etc to act upon the data held in that table.
You and I are not alone in trying to find a way of dealing with this issue, just check previous entries, but none of the posts have been answered with a good solution.

I am suprised that the expertise within this forum have not found a solution, but perhaps one day...

Pat Hartman · Feb 8, 2004

The real solution is to prevent duplications to begin with. All tables should be defined with primary keys or a unique index that prevents logical duplicates from being added to the table. Rather than trying to create a bandaid, why not truely fix the problem?

As to the proposed solutions, correcting data after the fact is always a problem. What happens when people need to use the data between the time it gets added and when it gets fixed? During that period they are seeing invalid data. Doesn't this bother anyone? How can users trust the system when at least some of the time the data it provides is incorrect?

paulreed · Feb 8, 2004

Pat, I'm sure that you are right, this should have been addressed at design stage, and no doubt at some stage will be, but by someone with more confidence in their programming abilities than me (I don't do visual basic!)

But at this moment in time, I am stuck with the bandaid, and if the expertise in this forum can rise to the challenge, I would be endebted.

Pat Hartman · Feb 8, 2004

The problem has no stock solution because relational database theory says that duplicates should not happen when the tables are properly defined. So why would there be a built-in method to overcome a non-problem? Fixing the problem actually requires no programming whereas the bandaid will require programming.

You have already discarded the two best solutions - creating a new table with the correct primary key and appending the old data to the new table. This method drops the duplicate records. The other viable method is to add a column to the table and create a VBA function that updates the table by assigning a generated sequence number to each record in a set. In this solution you would follow the update with a delete query that deletes all entries with sequence numbers > 1. The first solution is a no-code solution and is the one I use when converting data that contains duplicates as long as it doesn't matter which record is kept or if by sorting I can get the record I want to keep to be first in a list. I understand that this doesn't work in your situation since you are not retirine the source table. So for you, the second solution should be your choice even though it requires programming. Adding a new column to the existing table should not create a problem for the existing application provided you define it as not required with a default of null (empty value in the default field rather than 0 as the table builder's default).

No other solution is worth consideration unless you have time to analyze the data manually. Queries will always return all duplicates in a set. There is no way to get them to return all but one.

paulreed · Feb 8, 2004

Pat, forgive my ignorance, but I don't know how to acheive the second solution.
I have attached the table, and as you will see, the field Incidentnum contains 2 records where fields 2, 3 & 4 are the same (inc 1224), how will you solution work?

Pat Hartman · Feb 8, 2004

Paul, I'm sorry but I don't have any code samples to post for you. I cannot create custom code for you because I would never have time to do any of my own work if I coded solutions for everyone who posted here and besides as I have said many times, I've written my million lines of code and don't need the practice. Sounds like it is time for you to learn VBA. May I suggest the Access Cookbook. Details on this and my other recommendations at:
http://www.access-programmers.co.uk/forums/showthread.php?s=&threadid=51192

Mitch_____W · Feb 9, 2004

I have attached your .mdb file with a form and a query added to it. Copy the Form and Query into your DB and then open the form and click the "Delete Duplicates" button. The code should delete the duplicates you are wanting to delete. Most likely you will need to go to "Tools" and then "Options" and then on the "Edit/Find" tab uncheck the "Record Changes" checkbox under "Confirm", or you will have to confirm each deletion...

Hope this works for ya!

paulreed · Feb 9, 2004

Mitch

That works brilliant!

I am not quite sure how you have acheived it, and will spend some time this evening looking through your code. But none the less it works a treat, and I would like to thank you for your patience and help.

Lanason · Feb 9, 2004

I'll try that

Pat - good idea - thats so obvious isnt it -

funny how the easy answer seems to slip by....... I'll have a go at that.

Thanks

Deleting Duplicates

paulreed

Registered User.

WayneRyan

AWF VIP

paulreed

Registered User.

Lanason

Registered User.

paulreed

Registered User.

Pat Hartman

Super Moderator

paulreed

Registered User.

Pat Hartman

Super Moderator

paulreed

Registered User.

Attachments

Pat Hartman

Super Moderator

Mitch_____W

Mitch

Attachments

paulreed

Registered User.

Lanason

Registered User.

Similar threads

Users who are viewing this thread