Multi-table wildcard query

dearsina · Aug 18, 2004

I have two tables in Access with company data, like company name, address, contact, etc.

One of the tables has complete company names, like "Random Business Limited" or "Huge Corporation Plc", the second has only shorter names like "Random Business" or "Huge Corporation".

I wish to set up a query which will recognise these fields as matches, something like if table1.a = table2.a* show table1.a, with the wildcard being the key symbol of course.

Am I making sense? Can it be done?

Thanks in advance,

sina
london

sfreeman@co.mer · Aug 18, 2004

maybe not

Maybe we can't get there from here

Assuming tables with sample data:

Table1
CoName
Huge Company LLC
Big Mucks Incorporated
Even Bigger Mucks, Inc.

Table2
CoName
Huge Company
Big Mucks
Even Bigger Mucks

Create a Union Query to get all the data into one table:
Select *
From Table1
UNION select *
from Table2;

Then, my first thought was to run a standard find duplicates query, i.e.:
SELECT PrimaryUnionQuery.CoName, PrimaryUnionQuery.CoAddress
FROM PrimaryUnionQuery
WHERE (((PrimaryUnionQuery.CoName) In (SELECT [CoName] FROM [PrimaryUnionQuery] As Tmp GROUP BY [CoName] HAVING Count(*)>1 )))
ORDER BY PrimaryUnionQuery.CoName;

BUT, there is no way to incorporate the wildcards (at least not as far as I can see).

So, to use wildcards, you would have to know the similarities for each of the matching records, which would probably create a query as long as your entire table, i.e.

SELECT PrimaryUnionQuery.CoName
FROM PrimaryUnionQuery
WHERE (((PrimaryUnionQuery.CoName) Like "Big*")) OR (((PrimaryUnionQuery.CoName) Like "Huge*")) 'etc. etc. etc. etc.

ORDER BY PrimaryUnionQuery.CoName;

But then, it is still not providing you with much usable info.

Lastly, and this is the best I can come up with, to show all matching records, if you know of a common criteria, such as the first 5 letters are the same in each record in each of the tables, oh say, like the sample data above, then:

SELECT Table1.CoName, Table2.CoName
FROM Table1 INNER JOIN Table2 ON Table1.CoName Like Left(Table2.CoName,5)& "*";

'note the number 5 is the number of left most characvters that will be used as criteria - adjust as necessary

Just a suggestion, but when you finally get a clear match on all of the records, use a forgeign key field so that you don't have two sets of names. For example, have the second table contain the CompanyID value in the name field referencing the first table.

Good luck..

dearsina · Aug 19, 2004

Hi, and thanks for your suggestions. You are absolutely right that the first two suggestions have problems.

SELECT Table1.CoName, Table2.CoName
FROM Table1 INNER JOIN Table2 ON Table1.CoName Like Left(Table2.CoName,5)& "*";

The last one though, was interesting (see above), and I gave it a go, only to discover that there are companies as small as "Oxfam" and as large as "Marketlink Marketing Communications Ltd" (Actual data). So the number 5 (or whatever its set to be) becomes either too big or too small.

Then a thing occured to me, what about exchanging the static value of 5 with a dynamic value of the length of the table2.CoName which its trying to match? I tried the following query:

SELECT [Viatel data].COMPANY, [Our data].company
FROM [Our data] INNER JOIN [Viatel data]
ON [Viatel data].COMPANY LIKE Left([Our data].company,Len([Our data].Company)) & "*";

([Viatel data] being the table with the longer names, and [our data] the shorter)

But that just killed the query. I don't know if Len() is an ideal function for this, or if there is a way of rewriting the above line to make it run reasonably ok?

I have a feeling there is a way to solve this. Call it a hunch, call it geektuision, call it Susan if that makes you happy, but I think it can be done..!

sina
london

Dwight · Aug 19, 2004

My suggestion is that you spend your energy normalizing the data in the tables so that you do not have to resort to complicated queries.

dearsina · Aug 19, 2004

Dwight, I'm not keeping the records, I'm just comparing them; and there is a lot of it, so normalizing it isn't really an option. It would definetily have been my suggestion too, if I was to keep the data.

sina
london

dearsina · Aug 24, 2004

If you're reading this in the future (or today rather, in the future when thinking of when I wrote this post) thinking you have the solution, please please write a couple of lines. Althought this particular project is over (I wasn't able to produce this crutial rapport) I would like to know how to do this, because I'm sure there is a way, I doubt I'm the only one that has ever had this problem!

sina
london

Pat Hartman · Aug 25, 2004

I had exactly this problem for a conversion that I did for a large reinsurance company. They were consolidating the company "master files" from 12 systems (some of which were from foreign subsidiaries which complicated the problem even more because of language differences). I used Access to bring everything together since the various source files ranged from DB2, to Sybase, to VSAM, to IMS/DB, to Lotus Notes. I linked to the source tables where ODBC drivers were available and did periodic imports from the other sources.

First the duplicates within each file needed to be identified and then the duplicates between files. I started by standardizing name parts. Since these companies were mostly insurance, reinsurance, or banks there were a large number of "words" that were commonly found and each of these words had various abreviations. So I made lists -
Company - co - comp
insurance - ins
corporation - corp - inc
Bank - banque - banco
etc.

Then I added two new columns to my consolidated company data table. One to hold the parsed name and a second to hold the pk of a "match". I wrote a parsing routine that looked for all of the abbreviations in my list and expanded them to the full words. You could go the other way if you prefer. The parsing routing removed all caps and all punctuation and any double spaces. That found about 10% of the duplicates. Sorting and manual grouping found some more. Each time I identified a match, I put the pk of the "match" into the second new column.

I had no way of changing the real company names in the various source systems so I used to send spreadsheets with suggested corrections to the system owners. Ultimately, the data got cleaner as the users responded to the cleanup process.

The conversion/cleaning process was going on at the same time as the developement of the new customer system. The intention was that the data would be fully cleaned at the time the final conversion took place.

We also used a purchased product. I forgot the company's name but their initials were SSA and they had an office in Greenwich, Ct. The cost was several thousand dollars but the product was excellent and was a great help in cleaning up the rest. They used sophisticated matching technology and assigned a % match to each potential matching name. It would identify "City of San Diego" and "San Diego, City of" to be 100% match.

Multi-table wildcard query

dearsina

New member

sfreeman@co.mer

Sam_F

dearsina

New member

Dwight

Registered User.

dearsina

New member

dearsina

New member

Pat Hartman

Super Moderator

Similar threads

Users who are viewing this thread