Using the Simil function for best match (1 Viewer)

here4real · Mar 11, 2015

It is downloadable from the reference library and is a GREAT resource for doing a fuzzy match.

My question is how can I implement it take a field on a record and have it return ONLY 1 record from another table that is the best match? Simil will compare 2 strings. I want to compare one sting to a field in a table?

Thanks.

jdraw · Mar 11, 2015

Did you try it?

here4real · Mar 11, 2015

Yes.

Let me give a little background.

I have a table of good addresses. We can call that table Good.

I have another table that has addresses that for each one I want it to do a fuzzy search against Good and return the best match as long as that match has 95% accuracy. Simil in its simplest form doesn't scan a second table to find the best match. It just scores string1 against string2.

jdraw · Mar 11, 2015

And that is how most functions Levenstein Distance etc work.

From experience I can tell you that address matching/validation can be a real PITA.
I honestly don't think you would be happy with Soundex or Levenstein.
Iin the end you will likely need someone's eyeballs to do final check.

I recall someone on another forum looking at fuzzy matching. You can see the dialog for reference.

Good luck with the project. I'd like to hear/see what you do in the end.

here4real · Mar 11, 2015

Okay. I broke it down into several steps.

1) I do a join for each record against all addresses. In my case, there is a level above address (there is a limited set of addresses based on a different field) so the number of generated records isn't so onerous. I did this as a MakeTable query. Besides the Simil score, I also pull off the number of the address (whatever exists before the first " " or "-") and save that as well.

2) A query off of that table that groups based on person what the highest Simil score is.

3) A query that joins that table against the query in step 2 matching person, address and score. This is so I can bring in additional data located in the Addresses table. I also check that the number of the address is the same and that the Simil score is over .95. The reason for the number address is because 15 Main Street and 18 Main Street have very high Simil scores but obviously are different. By pulling off the number portion of the addresses I ensure that the high Simil score is in fact the same address.

2)

here4real · Mar 11, 2015

Correction. I did .9, not .95.

here4real · Mar 11, 2015

Still tweaking it. Since I am matching on the address number portion, I decreased to Simil score to .65 which caught a LOT more.

here4real · Mar 12, 2015

jdraw - did that make sense?

jdraw · Mar 12, 2015

Does it do what you need/want?
Do you still have to do a lot of manual intervention?
If you can get something to isolate groups or patterns, you might be able to focus soome code/logic to handle each or several patterns.

Did you look at the other post I mentioned earlier. There are links within it and there are people with much more math and linguistic talent than mine who do a lot of processing of genome data. It may be too complex for your needs, but some parts of it might be useful.

good luck

Galaxiom · Mar 12, 2015

The Damerau-Levenshtein function in post 10 of this thread works well. It can be weight adjusted for different types of errors.

http://www.access-programmers.co.uk/forums/showthread.php?t=246737

Easy enough to get the best match ordering by the result and selecting TOP 1 or Max().

What if you get more than one best result?

Using the Simil function for best match (1 Viewer)

here4real

Registered User.

jdraw

Super Moderator

here4real

Registered User.

jdraw

Super Moderator

here4real

Registered User.

here4real

Registered User.

here4real

Registered User.

here4real

Registered User.

jdraw

Super Moderator

Galaxiom

Super Moderator

Similar threads

Users who are viewing this thread