View Full Version : Removing multiple duplicates from a worksheet.
odin1701 07-18-2008, 07:21 AM I have a file which has account numbers. There are duplicate account numbers throughout the sheet - 2000-3000 rows of information.
I want to write a program which will find all the duplicates, mark them for deletion, and then remove the rows.
I can do everything except find the duplicates.
I know how to use the Match command, but it only returns the first match in an array. I need to identify all matches. Is there something which can do that?
namliam 07-18-2008, 07:24 AM How about using the CountIf function of excel to find out how many times the Accountnumber excists? If > 1 delete or something like that?
chergh 07-18-2008, 07:28 AM Use the advanced filter and choose unique records only, then delete the entire lot and replace it with the filtered list.
odin1701 07-18-2008, 07:31 AM chergh - I need to do this all in VBA as it will have to be done multiple times.
odin1701 07-18-2008, 07:32 AM How about using the CountIf function of excel to find out how many times the Accountnumber excists? If > 1 delete or something like that?
How would I get the numbers of the rows to delete though?
I would need to delete all but one instance of the account number.
chergh 07-18-2008, 07:33 AM You can use the advanced filter in vba. Record a macro to give you the general idea then modify it as necessary.
namliam 07-18-2008, 07:38 AM Create a new column in front of the Account number.
I will assuming the accountnumber is in Column B now.
Now in A1 put this formula: =Countif($B$1:B1, B1)
Any row that returns 2 in column A will have to be deleted.
To delete:
Set an auto filter
Select anything > 1 in column A
Select all rows
Delete
Remove filter
Done!
odin1701 07-18-2008, 07:55 AM Okay that would work to identify the rows that need attention, but after that then what?
If I filtered anything > 1 and then deleted it, I would delete the duplicates, but I would also delete the ALL of those account numbers. Both the first instance and the next instance will have numbers > 1 in column A.
The advanced filter seems to work, but I'm not too keen on it at this point.
chergh 07-18-2008, 07:58 AM The advanced filter is the sensible way to do it, why are you not keen on it?
namliam 07-18-2008, 08:01 AM If I filtered anything > 1 and then deleted it, I would delete the duplicates, but I would also delete the ALL of those account numbers. Both the first instance and the next instance will have numbers > 1 in column A.
No?? :confused:
With my provided Countif the first record will have 1 not > 1 therefor will not be selected for deletion?? :confused:
I dont understand?
Also why not use the filter? Offcourse you can skip down the rows one by one, but the filter is about 100 times faster and easier.
odin1701 07-18-2008, 08:02 AM The advanced filter is the sensible way to do it, why are you not keen on it?
Because it's doing the work itself, and I'd rather have a program do only exactly what I instruct it to do.
odin1701 07-18-2008, 08:07 AM No?? :confused:
With my provided Countif the first record will have 1 not > 1 therefor will not be selected for deletion?? :confused:
I dont understand?
Also why not use the filter? Offcourse you can skip down the rows one by one, but the filter is about 100 times faster and easier.
Well....I did a search for an account number - it only found 2 of them.
This is the code I have in the corresponding cells to the left of these duplicate account numbers:
=COUNTIF($B$8:$B$1898, B46)
and
=COUNTIF($B$8:$B$1898, B47)
Both A46 and A47 have a value of "2" in them.
namliam 07-18-2008, 08:07 AM A filter only does EXACTLY what you do it... no more no less....
chergh 07-18-2008, 08:09 AM Do you think the advanced filter is going to delete non unique records for a laugh or something?
odin1701 07-18-2008, 08:09 AM Oh I see what I did....oops. I was thinking incorrectly.
odin1701 07-18-2008, 08:09 AM Do you think the advanced filter is going to delete non unique records for a laugh or something?
Never know.
I prefer a manual method.
namliam 07-18-2008, 08:10 AM =COUNTIF($B$8:$B$1898, B46)
Aha! But that isnt the code I suggested!!!
Now in A1 put this formula: =Countif($B$1:B1, B1)
Try changing your examples to :
=COUNTIF($B$8:B46, B46)
=COUNTIF($B$8:B47, B47)
This should return 1 and 2 respectively !
That is untill offcourse the '2' is removed ;)
odin1701 07-18-2008, 08:12 AM Yeah I saw that...I was mistakenly thinking I needed to search the whole range from the top, but I see what you did :)
namliam 07-18-2008, 08:13 AM Never know.
Well you do know... Filter everything > 1 and exactly that is what is filtered.
But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.
This tho I have been using for atleast 8 years without fail... But it is your project after all.... and (again) I do know where you are comming from :mad: bloody self thinking *freaking me out* darnation software!
chergh 07-18-2008, 08:13 AM Never know.
I prefer a manual method.
If there was a bug with the advanced filter it would have been discovered by now but if you want to make life hard for yourself then feel free.
odin1701 07-18-2008, 08:34 AM Well you do know... Filter everything > 1 and exactly that is what is filtered.
But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.
This tho I have been using for atleast 8 years without fail... But it is your project after all.... and (again) I do know where you are comming from :mad: bloody self thinking *freaking me out* darnation software!
Yeah it's just that this particular sheet is extremely important and if something was removed that shouldn't be it can be a HUGE issue - we're talking compliance with federal laws and stuff so yeah I'm being a bit over paranoid with this project.
Thanks for the help - it's working now. And in a way that I can manually verify the data to insure integrity so that I know the program can't be doing something I don't want it to.
What I had it do was loop through the results and for every value > 1 I had it put "DELETE" in that cell. Then looped through after that and removed the rows that had "DELETE".
namliam 07-18-2008, 08:39 AM What I had it do was loop through the results and for every value > 1 I had it put "DELETE" in that cell. Then looped through after that and removed the rows that had "DELETE".
As long as you feel your solution is good (enough)/better or compliant in the laws and regulations you have to live with... that is good enough for me !
Glad you got your issue resolved and HTH as allways ! :cool:
Brianwarnock 07-18-2008, 09:22 AM I prefer a manual method.
Smash the chips, get out the quill pens.
But I do understand where you are comming from tho... A lot of times if you count on default or "self thinking" software... eventually it will come around and bite you somewhere painfull.But you are happy with Countif , Delete, and all the other functions!!
Yeah it's just that this particular sheet is extremely important and if something was removed that shouldn't be it can be a HUGE issue - we're talking compliance with federal laws and stuff so yeah I'm being a bit over paranoid with this project.
I would have thought that the legal eagles trusted propriety code more than home grown
Brian
|