View Full Version : Removing multiple duplicates from a worksheet.


odin1701
07-18-2008, 07:21 AM
I have a file which has account numbers. There are duplicate account numbers throughout the sheet - 2000-3000 rows of information.

I want to write a program which will find all the duplicates, mark them for deletion, and then remove the rows.

I can do everything except find the duplicates.

I know how to use the Match command, but it only returns the first match in an array. I need to identify all matches. Is there something which can do that?

namliam
07-18-2008, 07:24 AM
How about using the CountIf function of excel to find out how many times the Accountnumber excists? If > 1 delete or something like that?

chergh
07-18-2008, 07:28 AM
Use the advanced filter and choose unique records only, then delete the entire lot and replace it with the filtered list.

odin1701
07-18-2008, 07:31 AM
chergh - I need to do this all in VBA as it will have to be done multiple times.

odin1701
07-18-2008, 07:32 AM
How about using the CountIf function of excel to find out how many times the Accountnumber excists? If > 1 delete or something like that?

How would I get the numbers of the rows to delete though?

I would need to delete all but one instance of the account number.

chergh
07-18-2008, 07:33 AM
You can use the advanced filter in vba. Record a macro to give you the general idea then modify it as necessary.

namliam
07-18-2008, 07:38 AM
Create a new column in front of the Account number.
I will assuming the accountnumber is in Column B now.
Now in A1 put this formula: =Countif($B$1:B1, B1)
Any row that returns 2 in column A will have to be deleted.
To delete:
Set an auto filter
Select anything > 1 in column A
Select all rows
Delete
Remove filter

Done!

odin1701
07-18-2008, 07:55 AM
Okay that would work to identify the rows that need attention, but after that then what?

If I filtered anything > 1 and then deleted it, I would delete the duplicates, but I would also delete the ALL of those account numbers. Both the first instance and the next instance will have numbers > 1 in column A.

The advanced filter seems to work, but I'm not too keen on it at this point.

chergh
07-18-2008, 07:58 AM
The advanced filter is the sensible way to do it, why are you not keen on it?

namliam
07-18-2008, 08:01 AM
If I filtered anything > 1 and then deleted it, I would delete the duplicates, but I would also delete the ALL of those account numbers. Both the first instance and the next instance will have numbers > 1 in column A.
No?? :confused:

With my provided Countif the first record will have 1 not > 1 therefor will not be selected for deletion?? :confused:

I dont understand?

Also why not use the filter? Offcourse you can skip down the rows one by one, but the filter is about 100 times faster and easier.

odin1701
07-18-2008, 08:02 AM
The advanced filter is the sensible way to do it, why are you not keen on it?

Because it's doing the work itself, and I'd rather have a program do only exactly what I instruct it to do.

odin1701
07-18-2008, 08:07 AM
No?? :confused:

With my provided Countif the first record will have 1 not > 1 therefor will not be selected for deletion?? :confused:

I dont understand?

Also why not use the filter? Offcourse you can skip down the rows one by one, but the filter is about 100 times faster and easier.

Well....I did a search for an account number - it only found 2 of them.

This is the code I have in the corresponding cells to the left of these duplicate account numbers:

=COUNTIF($B$8:$B$1898, B46)

and

=COUNTIF($B$8:$B$1898, B47)

Both A46 and A47 have a value of "2" in them.

namliam
07-18-2008, 08:07 AM
A filter only does EXACTLY what you do it... no more no less....

chergh
07-18-2008, 08:09 AM
Do you think the advanced filter is going to delete non unique records for a laugh or something?

odin1701
07-18-2008, 08:09 AM
Oh I see what I did....oops. I was thinking incorrectly.

odin1701
07-18-2008, 08:09 AM
Do you think the advanced filter is going to delete non unique records for a laugh or something?

Never know.

I prefer a manual method.

namliam
07-18-2008, 08:10 AM
=COUNTIF($B$8:$B$1898, B46)


Aha! But that isnt the code I suggested!!!

Now in A1 put this formula: =Countif($B$1:B1, B1)


Try changing your examples to :
=COUNTIF($B$8:B46, B46)
=COUNTIF($B$8:B47, B47)

This should return 1 and 2 respectively !
That is untill offcourse the '2' is removed ;)

odin1701
07-18-2008, 08:12 AM
Yeah I saw that...I was mistakenly thinking I needed to search the whole range from the top, but I see what you did :)

namliam
07-18-2008, 08:13 AM
Never know.

Well you do know... Filter everything > 1 and exactly that is what is filtered.

But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.

This tho I have been using for atleast 8 years without fail... But it is your project after all.... and (again) I do know where you are comming from :mad: bloody self thinking *freaking me out* darnation software!

chergh
07-18-2008, 08:13 AM
Never know.

I prefer a manual method.

If there was a bug with the advanced filter it would have been discovered by now but if you want to make life hard for yourself then feel free.

odin1701
07-18-2008, 08:34 AM
Well you do know... Filter everything > 1 and exactly that is what is filtered.

But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.

This tho I have been using for atleast 8 years without fail... But it is your project after all.... and (again) I do know where you are comming from :mad: bloody self thinking *freaking me out* darnation software!

Yeah it's just that this particular sheet is extremely important and if something was removed that shouldn't be it can be a HUGE issue - we're talking compliance with federal laws and stuff so yeah I'm being a bit over paranoid with this project.

Thanks for the help - it's working now. And in a way that I can manually verify the data to insure integrity so that I know the program can't be doing something I don't want it to.

What I had it do was loop through the results and for every value > 1 I had it put "DELETE" in that cell. Then looped through after that and removed the rows that had "DELETE".

namliam
07-18-2008, 08:39 AM
What I had it do was loop through the results and for every value > 1 I had it put "DELETE" in that cell. Then looped through after that and removed the rows that had "DELETE".
As long as you feel your solution is good (enough)/better or compliant in the laws and regulations you have to live with... that is good enough for me !

Glad you got your issue resolved and HTH as allways ! :cool:

Brianwarnock
07-18-2008, 09:22 AM
I prefer a manual method.
Smash the chips, get out the quill pens.

But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.But you are happy with Countif , Delete, and all the other functions!!

Yeah it's just that this particular sheet is extremely important and if something was removed that shouldn't be it can be a HUGE issue - we're talking compliance with federal laws and stuff so yeah I'm being a bit over paranoid with this project.

I would have thought that the legal eagles trusted propriety code more than home grown

Brian