match letters in one field to letters in another and display % match

My guess is that Google is using dictionaries to match all word forms and Bayesian webs of relationships to cross-connect, say, mis-spellings with the desired results.

I wouldn't know how to even begin implementing either of these things in Access. Nobody but Google has access to so much data and the behavior of individuals to analyze, so nobody would be able to replicate their results on a small scale.
 
DFenton,

I am implimenting the Levenschtien function in my database to good effect. What I would like to know is something that is possible so simple that I am blind to it.

In the function it returns LD as the number of iterations needed to change from string1 (source) into the string2 (target), would you or anyone else know how I would turn this number into a percentage of the length of string2 (target)?

The reason I ask is thus:

users input large amounts of text into a form on my DB which on close the text is checked using the above function and if a close match occurs, a msgbox appears telling the user that it could be a match to data already entered.

I would like this message box to show the percentage difference between the two records.

(sorry to hijack but this is the most interesting function I have come across and I almost (almost) understand it) :)
 
It seems to me that if if the LD is, say, 6, and there are 12 characters, that would be 50%. But you can't just divide the two. You have to subtract the LD, then use the remaining number divided by the length.

Something like:

lngLD = 6
lngLength = 12
Percent = (lngLength - lngLD) / lngLength

In the case of LD of 2, that would give 10/12, which is 83.33%, which seems intuitively correct.

But I don't think that's precisely correct.

It seems to me that a simply ranking of LD number is sufficient, i.e., the smaller the LD value, the closer the two are, and you don't really need a percentage.

That is, if you are comparing a string to 10 comparison values, and the LDs for the 10 values are 2, 8, 6, 6, 4, 12, 5, 9, 1, 15, 12, 10, you just sort them ascending by the LD value, and the first ones are the closest match.

No?

I do have a percentage overlap function that has nothing to do with LD somewhere. It checks InStr() first, then puts the two strings in alphabetical order and determines the amount of overlap between the two strings. I'm not sure if I ever used it in production code or not, or if I might have used it for a big de-duping project where I had to merge two data files, one with 100K records and the other with 250K and likely substantial overlap. If you're interested, I'll dig it out and post it (if it's not too embarrassing).
 
I am really not very good at the math but your initial working of percentage seems to work to a suitable level for my needs. Thanks for your reply.

TBH I don't think I'll need the other formula but thanks for your generosity.

:)
 

Users who are viewing this thread

Back
Top Bottom