How to find doublets sharing part of name

spikepl

Eledittingent Beliped
Local time
Today, 16:41
Joined
Nov 3, 2010
Messages
6,142
I have to sort out a database of goods where many doublets have been created over time. Unfortunately each doublet/triplet/multiplet can be disguised by not having ANY fields common with the other versions. The only pointer I have is that the SKU of a copy contains the SKU of the original. I have some 6000 SKU's.

I want to identify each SKU that has been copied, and into what SKU or SKUs.


tblSKUs
----------
Seq : sequential numerator for shiwing it on AWF
ID: Id of the item (Autonumber)
SKU: SKU (Text)

Code:
Seq   ID        sku       
[FONT=Courier New]1   2205    MQAC2212       
2   17416   MQAC2212-black-s-x       
3   2258    MQAC2232       
4   12815   MQAC2232-red-X       
5   2283    MQAC2254       
6   13028   MQAC2254-00       
7   16278   MQAC2254-red-s-x      
8   16298   MQAC2254-red-s-x1       
9   16304   MQAC2254-red-s-x2      
10  16313   MQAC2254-red-s-x3       
11  2302    MQACP       
12  12751   MQACP100-X       
13  17468   MQBA201-1-s-x       
14  17476   MQBA403-1-s-x       
15  17216   MQBA503-1-s-x   [/FONT]
In the above table, item with Seq 2 is a copy of 1, 4 is a copy of 3, 6-10 are copies of 5, and 12 is copy of 11. Each copied SKU is thus the original SKU + some more characters. Minimum length for an SKU is 4. No maximum length.

I can do anything using VBA, but need to get inspiration to do minimum possible using queries.

I would be happy to get a table like this ( for each copy in RH column, the Orginal is listed in the LH)

Code:
        [FONT=Courier New]
Original [/FONT][FONT=Courier New]Copy[/FONT]   
MQAC2212   MQAC2212-black-s-x     
MQAC2232   MQAC2232-red-X     
MQAC2254   MQAC2254-00     
MQAC2254   MQAC2254-red-s-x     
MQAC2254   MQAC2254-red-s-x1     
MQAC2254   MQAC2254-red-s-x2     
MQAC2254   MQAC2254-red-s-x3     
MQACP      MQACP100-X
My mind is blank at this stage :-( .. can I do this with plain queries?
 
The table you want can be achieved in a query. Create a query based on your table, bring in sku and then create a calculated field (Original) to extract out the portion of the sku you are using to identify duplicates (looks like a simple Instr() function nested in a Mid () will do). Then group that query up on both fields. That creates your de-duping datasource.

Code:
SELECT Mid([sku], 1, Instr([sku], "-")-1) AS Original, sku FROM tblSKUs GROUP BY Mid([sku], 1, Instr([sku], "-")-1) , sku)

You'll probably have to make the Mid a conditional since not all sku values contain a hyphen. To exclude skus without modifiers, just check for the presence of the hyphen.
 
Eh I have not been clear obviously. Sorry. There is no easy way to ID what part of SKU is common. Look at item 11 and 12. What follows the root-part of the SKU can be any character, not just -. I do not see any obvious way to extract the root part. The root part itself can contain a hyphen. The root part can thus be any assembly of characters.

I am simply too burned out to see anything clever. I always have a brute-force approach in VBA as backup - check if any SKU fits inside any other :D
 
This query will do it straight off

Code:
SELECT Table4.ID, Table4.SKU AS Original, Table4_1.ID, Table4_1.SKU AS Copy
FROM Table4, Table4 AS Table4_1
WHERE (((InStr([table4_1].[sku],[table4].[sku]))=1) AND (([table4_1].[sku]<>[table4].[sku])=True) AND ((InStr([table4].[SKU],"-")=0)=True))
Substitute Table 4 for the name of your table and add other varables as required

Note: It assumes that an original does not have any '-' in it
 
Just noticed in your last post that root can have an italic - if that is the case can you confirm that root is always the beginning of the code?
 
The root can be anything, but is at the left. The tail (i.e. the RH-part) can be anything. There is no specific separator between root and tail. The length of the root is variable. So is the length of the tail.

Thanks for your efforts so far, guys. I 'll look at this again in some 12 hours or so.
 
try this one instead

Code:
SELECT Table4.SKU AS Copy, (SELECT MIN(SKU) FROM Table4 as tmp where instr(table4.sku,sku)=1) AS Original
FROM Table4

It assumes the original is actually in the table
 

Users who are viewing this thread

Back
Top Bottom