Remove duplicates - if 2 fields are equal

Libre · Nov 19, 2013

Sorry if this has been asked before, and thank you for any help.
Now that the apologies and gratitude is out of the way, please help me solve this problem if you can.
I have a large table with many fields and many rows. There is no primary key. I'll call one field ParentPN, and another field ChildPN. There are many other fields as well.
I want to identify all rows where BOTH the ParentPN and ChildPN occur more than once.
I know how to create a query to identify duplicates of ONE field in the table, but not two.
I can solve this with VBA: I will read the two fields of interest in the first row, then compare both values with every other row. If it finds another row with BOTH ParentPN and ChildPN identical with the first, that's a "hit". Then, repeat with all the other rows.
I could find ways to make this run faster, but I was wondering if there are any build in functions to accomplish this. I looked at the Find Duplicates query builder, and all I see is I can select ONE field to search for dupes, not two.
So, any ideas?

plog · Nov 19, 2013

I would do this through 2 sub-queries: One would hold all the values of ParentPN that occur multiple times, the other would hold all the values of ChildPN that occur multiple times. Then build a 3rd query based off of your table. Link the ParentPN field to the sub-query which holds the multiple ParentPN values, Link the ChildPN field to the sub-query which holds the multiple ChildPN values and run it. That query will show the records you want.

namliam · Nov 19, 2013

Select ParentPN , ChildPN, count(*) from yourtable group by ParentPN, ChildPN having count(*) > 1

Should retrieve your dups the same way that you probably would do for single columns.
Then simply join your table with this query and you can see which records occure more than one time ...

Libre · Nov 19, 2013

Thanks - I will look at both these solutions and get back to you guys.
Appreciate the help.

Libre · Nov 20, 2013

Well - neither of these solutions (I may not understand them) are working out.
I don't know how to exactly execute the instructions I received. I tried both but am unable to produce meaningful results.
Sorry I'm so dense - I have to go ahead with the VBA proc.

CazB · Nov 20, 2013

are you looking for records where the 'combination' of ParentPN and ChildPN appear more than once.... or is it just any records which contain any repeated parentpn OR repeated childpn?

If it's the 'combination' you're looking for, then can't you just combine the ParentPN and ChildPN as a calculated field, and then use Find Duplicates on that?

jdraw · Nov 20, 2013

It appears that your issue has not been understood/well communicated. No one knows for sure WHAT your issue is.
Could you try to describe the issue again -- perhaps with a sample or two?

Libre · Nov 20, 2013

Here is a small (hypothetical) sample of my data.
There are many other fields besides the two I'm representing but their value is irrelevant so I'm omitting them:

ParentPN ChildPN
123 ABC
123 BCD
123 CDE

so far, everything is fine - no duplications

234 ABC
234 BCD
234 CDE
234 DEF

still okay - no dupes yet
but now a couple of dupes:

123 CDE
234 CDE

In other words, every ParentPN can (and does) appear multiple times. Every ChildPN can (and might) appear multiple times.
When the same combination of BOTH ParentPN AND ChildPN appear more than one single time, that constitues a duplication that I have to address.

This data is actually a huge set of Bills of Material - there are over 10K lines. Each line represents a single component of one Bill. The ParentPN is the Parent (Assembly) Part Number. The ChildPN is one component of that assembly in the BOM (Bill of Material). If a parent has 5 components (children), the table will have the ParentPN 5 times, and then 5 DIFFERENT ChildPn. But if there is a repeat of this same BOM in the table, then there will be repetions of Parent-Child pairs. The only way I can see to identify if there is a BOM more than once, ignoring all the other fields, is to identify where the same parent and same child exist more than once in the table - no parent should have two identical children (sounds like planned parenthood but it's configuration management in Engineering BOMs).

Sorry - it's difficult to describe the situation in words - did my best.

Libre · Nov 20, 2013

CazB said:
are you looking for records where the 'combination' of ParentPN and ChildPN appear more than once.... or is it just any records which contain any repeated parentpn OR repeated childpn?

If it's the 'combination' you're looking for, then can't you just combine the ParentPN and ChildPN as a calculated field, and then use Find Duplicates on that?

This solution looks very promising.
Will try.
In my mind it works already.
You may have saved me a day of programming and other day of running the blasting thing - recursive loops, 10K times - a nightmare.

CazB · Nov 20, 2013

Ok, from that, I think I was right in saying it's only when the combination of ParentPN and ChildPN is repeated that you have a problem.... so, going back to my previous post:

In a query, create a calculated field to combing the ParentPN and ChildPN,
eg CheckPN
and then use Find Duplicates on the CheckPN field

example
Select [ParentPN] & [ChildPN] AS CheckPN, count(*) AS CheckCount
FROM yourtable
GROUP BY [ParentPN] & [ChildPN]
HAVING (((Count(*))>1));

Libre · Nov 20, 2013

Caz-
Yes - you are right. I got excited (and had to get to a meeting (with the President no less)) and forgot to answer that.
A "duplicate" is ONLY when a specific ParentPN and ChildPN combination happens more than once.
So - I already created the calculated field and it works fine. have not created the find duplicates query yet tho.
And I don't really get the select statement you wrote, but I'm gonna paste it in a query builder and see what happens.
The main thing is, I'm sure this solution will work for me - so a BIG THANKS!

namliam · Nov 21, 2013

namliam said:
Select ParentPN , ChildPN, count(*) from yourtable group by ParentPN, ChildPN having count(*) > 1

Should retrieve your dups the same way that you probably would do for single columns.
Then simply join your table with this query and you can see which records occure more than one time ...

Dont want to be a pain, but why combine the two fields in a calculated field when you can leave them seperated like in my earlier example

CazB · Nov 21, 2013

namliam said:
Dont want to be a pain, but why combine the two fields in a calculated field when you can leave them seperated like in my earlier example

... because it's only when both are duplicated on the same record, and the OP said the Find duplicates query would only search on one field?

.... although I haven't looked to see if it makes any difference to the outcome

[Edit] looked at both ways of doing it now: same outcome! I suppose the only difference with 'my' way of doing it is you can use the calculated field in a Find Duplicates Query... guess it depends what you're looking for

namliam · Nov 21, 2013

CazB said:
in a Find Duplicates Query... guess it depends what you're looking for

Yes you can use your "calculated field" in the find duplicates "wizard"
However when you have 2 seperate fields you have these for a reason, in general I would say keeping seperate fields seperated would be the prefered solution so you can use the dataset as is to i.e. join back into your source table or join to other tables....

plog · Nov 21, 2013

Yeah, based on Libre's follow up explanation and data, Namliam got it right in his first post. Of course Libre also said he wasn't able to get it to work for him.

Libre · Nov 21, 2013

Let my clarify. I said that it didn't work for me but I wasn't sure how to exactly implement namliam's solution. That is because I didn't study it sufficiently. Now that I've looked at it more, and tried it again, it does ideed work!
The thing is, there is usually more than one good way of getting something done - and if you find a way that works, there may not be enough time to understand, implement, and test every one of those other methods. But now that I HAVE tried out namliam's way I see that it works fine. There were some other suggestions that probably would also do the job.
So thanks to all - I asked a question, got multiple useful answers, and I accomplished the task. What could be better? Thanks to everyone - even if I didn't oficially "THANK YOU" or click on the scales.

namliam · Nov 21, 2013

It isnt about getting the thanks, nor in word, nor on the skales...
For me its about finding the "best" solution being a perfect perfectionist that always delivers...

anything less than the best and if possible perfect solution is an issue waiting to happen

Libre · Nov 21, 2013

namliam said:
It isnt about getting the thanks, nor in word, nor on the skales...
For me its about finding the "best" solution being a perfect perfectionist that always delivers...

anything less than the best and if possible perfect solution is an issue waiting to happen

namliam - although I agree in some cases, there are often equivalently effective methods - not always a BEST. And what is best for one person may be less than the best for someone else. And even if there is a BEST, it is only the best while respecting one set of priorities, but less good with a different set.
Example:
What is the BEST way to extract oil from the ground or the ocean?
If maximum yield is your priority, there is a certain way that will maximize yield. If lowest cost is the priority, it will be another way. If human safety is the priority, it's a 3rd way. If it's environmental friendliness, it will be yet another way. (BP has its hands full, trying to do all of the above - they can't).

In actual fact, I used CazB's method except I applied his idea in Excel. Reason being, I didn't just have to identify the duplicates, but eliminate them. With the sql query you wrote, it produced a list of the duplicates, and how many times they were duplicated, but that did not alter the data in any way. Of course, identifying the records would be the first step in eliminating them. But there would still be work to do to get them out of the data. In Excel, I was able to identify, flag, resort, and delete the duplicate records. So it was a combination of ideas, my own knowledge, and the resources available to me, that made the ultimate solution the BEST for me, in this situation.

Mihail · Nov 21, 2013

I saw here many very skilled guys.
So, forgive me if I missed something and this approach already appear in the previous posts.

Create a table with exact same structure with the original one.
On this table create, using Index window, an index on the incriminated fields with the property "unique".
Run an Append query in order to append your original table to this new one.
!!! Turn OFF warning messages.

Again, sorry if this is a "duplicate" solution.

Good luck !

Libre · Nov 22, 2013

Thanks, Mihail-
I'm not sure if this works or not - actually I have already solved the problem using the "skilled guys" suggestions above. But I appreciate that you took the time to think about it and post a response.

Remove duplicates - if 2 fields are equal

been around a little

Banishment Pending

The Mailman - AWF VIP

been around a little

been around a little

Registered User.

Super Moderator

been around a little

been around a little

Registered User.

been around a little

The Mailman - AWF VIP

Registered User.

The Mailman - AWF VIP

Banishment Pending

been around a little

The Mailman - AWF VIP

been around a little

Registered User.

been around a little

Similar threads

Users who are viewing this thread