Intelligent String Compare?

Runey

New member
Local time
Tomorrow, 05:27
Joined
Jan 11, 2005
Messages
5
A weird question, maybe an impossible one.

Is there a way do something like an intelligent string compare.

What I have is two tables.
I need to find all matching record in both tables based solely on the name.
However it is very common for minor spelling errors and punctuation differences to occur.
e.g. 20,000 instead of 20000 or Name: Whatever instead of Name - Whatever
Is there any way to do a string comparision on this sort of thing and return a liklihood of the strings being the same? Maybe a third party program that can be accessed via VBA? I must be able to control it by VBA as the program needs to be as automated as possible.

I know this may be asking too much but I would appreciate any input or ideas

Thanks
 
What exactly are you trying to compare? Details please.
 
I have two tables. They are a list of games for the Amiga. Around 5000 entries in each table.

I need to look for each game from table one and see if it exists in table two. e.g. 20000 Leagues Under The Sea

This game exists in table 2 but is spelled differently. e.g. 20,000 Legaues Under The Sea

Another example would be 1943 - Battle for Midway in table 1 but it appears as 1943: Battle for Midway in table 2.

They are both 256 bit strings

Basically table 2 contains different information on each game than table 1. Table 1 contains information like game name, image name, number of disks, CRC, Size etc. Table 2 contains information like Publisher, Developer, Programmer etc. I need to merge the two tables together. But there are no UniqueIDs in each table that I can use to match each entry with the other table. The closest thing to unique IDs that I can use is the name of the game. However the spelling differences are preventing me from matching them together.

I hope this is enough information.
Thanks for your time
 
There is no fuzzy matching logic built in. I had to accomplish a similar task with company names. I added a new field to each table and ran some code that reformatted the names. Things to do:

1. remove all punctuation
2. remove all extra spaces
3. identify typical abbreviations and either expand abbreviations or abbreviate full spellings. In my case these were things like Ins-Insurance, Co-Company, Comp-Company, etc.

Once you get the file to this state, you should find a lot more matches. The final thing you can do is to search the web for "SOUNDEX". You should be able to find source code that you can copy. This is an old algorithm that used to be used by the phone company to help operators look up names without having to have the exact spelling. So this last step should get past spelling errors. In a nutshell, SOUNDEX removes all vowels and words like "the". SOUNDEX may work on a file that you havent done 1, 2, and 3 on but I don't know.
 
Thanks again PAT. Once again you come to my rescue (well sort of :) ) You helped me out once before in case you don't remember.

I was afraid the answer would be something like that :( Well it doesn't hurt to ask.

In anticipation I have already started coding something along the lines of what you suggested.

I will have a look on SOUNDEX and any other fuzzy logic stuff I can find. If I come up with something I'll post it here.

Thanks again
 
Puntuation Problems

Hello all,

New to Access & New to this forum!

I need to compare two columns from two tables and link fields using a similar logic as discussed.

I.e. Part Number - 15-3101 automatically linked to 153101 for all all punctuation. I'd like to do it without having to use the standard 'find and replace' as I need to do this frequently and there are too many forms of punctuation!!

Any advice or code would help!!
 
Access isn't a mind-reader. As Pat suggested, your best bet is to have another field that eliminates punctuation or in some other way unifies formats. Free-form matching is, to be blunt about it, a pain in the toches. There is no subsitute for proper prior planning to prevent poor patterns. I.e. filter the input before you allow a user to store it. And if this is not possible, all I can add is Good Luck!
 
:p
Runey said:
I have two tables. They are a list of games for the Amiga. Around 5000 entries in each table.

I need to look for each game from table one and see if it exists in table two. e.g. 20000 Leagues Under The Sea

This game exists in table 2 but is spelled differently. e.g. 20,000 Legaues Under The Sea

I'm not sure if this will be any help at all, but if this is a one off operation with 2 sets of data that are not going to be updated, I would do this in Excel.

after making a copy of the tables, I would:
  • Add a sortable ident to each set of data (A1, A2, .... for the first table, B1, B2, .... for the second)
  • As suggested in other replies, weed out punctuations
  • Use the various find, left, right, mid functions etc to split the titles into single words over a number of columns
  • Paste both sets of data to a single spread sheet
  • Insert a column and use a logic test to identify duplicates in the first feild of the title
  • Strip out matches and recheck the residue on the second column of the title etc
  • etc etc etc

With 2 lists of 5000 items, it might be that after the breaking up of the data, you could pull it in to a database to do the queries?

I have messed about with similar scenarios of, for instance, equipment tags for control systems. I don't think I have ever had to go much over a couple of hundred items. Tedious, but it can be done.

The time spent in Excel for a one-off operation might be less than the time required to build the database and verify it works correctly.

Of course, if these tables to be compared are parts of a relational database with other information, this wouldn't be much of an option.

So, in closing, rather than HTH, i will sign of with NSTH

Regards,

Keith.
 

Users who are viewing this thread

Back
Top Bottom