match letters in one field to letters in another and display % match

Did you see my last post? That works for counting occurences.

I'm not clear on exactly what you want to check. Give some examples and explain using those.

i did, but as we were writing i got it to work myself, so your function does what i did but in a much cleaner and easier way

the problem is that my idea didn't work, the idea to count occurences and match them up, so i'm looking for another idea


Carondelet Holy Cross Hospital Holy Cross Hospital 77.27%
Carondelet Holy Cross Hospital Carondelet Holy Cross Swing Beds 76.36%


this is the best example i have, on the left is the facility i'm looking for a match for. on the right are the 2 available matches. the one that shouldve been picked is Holy Cross Hospital.

it did get picked but the difference is way too small, i can't be sure that this method will pick the right facilities in the future automatically because in this case it picked it out almost by luck

so i'm asking if looking at this example and knowing which one should get picked you can think of a method to match them that would pick holy cross hospital


if i'm not being clear - sorry, can't do any better, i'm not good at that)))))))))
 
your function gives better numbers!!!!!!!! somehow. let me keep checking it out
 
It would fail when the order of Holy and Cross is mixed up. Does the order of words matter?

What would the percentage be for:
Code:
Carondelet Holy Cross Hospital               Cross Holy Hospital ??%
 
in my method it'd stay the same. yours - i'm not sure, i'm still trying to figure out what your function is doing

Code:
If InStr(i, strLong, Mid(strShort, i, 1)) Then
i don't understand, what's the condition for the IF here?
 
What would stay the same?

The IF checks each character of the shortest string against the longest string. However, after each character it starts searching from i and not the first character.

I think what you should be doing is searching word by word. Calculate the number of words present.
 
What would stay the same?

The IF checks each character of the shortest string against the longest string. However, after each character it starts searching from i and not the first character.

I think what you should be doing is searching word by word. Calculate the number of words present.

you asked what happened if the order of words would change? and i said using my method the % would stay the same

also, here's what i got when running the query using your method, all these 100% are not really 100%
do you know why?

Code:
Expr1   FacilityName	             FacName
100.00%	Carondelet Holy Cross Hospital	             Holy Cross Hospital
100.00%	Columbia St. Mary’s Hospital Columbia	             Columbia Hospital
100.00%	Columbia St. Mary’s Hospital Ozaukee 	             St Marys Hospital-Ozaukee
100.00%	Lourdes Hospital	               Our Lady Of Lourdes Memorial Hospital
100.00%	Mount St. Mary’s Hospital and Health Center 	 Mt St Marys Hospital
100.00%	Providence Health Center	                           Providence Healthcare Network
100.00%	St. John Hospital and Medical Center	              St John Medical Center
100.00%	St. John Macomb-Oakland Hospital	              St John Macomb-Oakland Hospital
100.00%	St. Mary’s Hospital at Amsterdam	              St Marys Hospital
100.00%	St. Mary’s of Michigan Medical Center	              St Marys of Michigan
100.00%	St. Mary’s Warrick Hospital	                           St Marys Warrick
100.00%	St.Vincent Seton Specialty Hospital	              St Vincent Seton Specialy Hospital
100.00%	University Medical Center Brackenridge	  University Medical Center at Brackenridge


as far as matching by words. in the carondelet case it wouldn't make a difference. 3 words match in both cases
 
Is the real problem that the data should be different to what is is. Should the values in the two fields be the same?
 
Just saw your last post.

Would it be better to compare words in the two strings and not letters.

The Split function could be used here with a space as the delimiter.
 
Just saw your last post.

Would it be better to compare words in the two strings and not letters.

The Split function could be used here with a space as the delimiter.
That was what was suggested in my last post and one/two other posts prior to that.

Here's the word search I was referring to:
Code:
Public Function GetOccPercentage(strMatch As String, strSearch As String) As String
    Dim lenMatch As Integer, lenSearch As String, intMatch As Integer
    Dim splitMatch() As String, splitSearch() As String, i As Integer, y As Integer
    
    If Len(strSearch) = 0 And Len(strMatch) = 0 Then
        GetOccPercentage = "100 %"
        Exit Function
    ElseIf Len(strSearch) = 0 Or Len(strMatch) = 0 Then
        GetOccPercentage = "0 %"
        Exit Function
    End If
    
    splitSearch = Split(strSearch, " ")
    lenSearch = UBound(splitSearch)
    splitMatch = Split(strMatch)
    lenMatch = UBound(splitMatch)
    
    For i = 0 To lenSearch
        For y = 0 To ubound(splitMatch)
            If splitSearch(i) = splitMatch(y) Then
                splitMatch(y) = ""
                intMatch = intMatch + 1
                Exit For
            End If
        Next
    Next

    GetOccPercentage = Round((intMatch / (lenMatch + 1)) * 100, 2) & " %"
End Function
 
Last edited:
Just saw your last post.

Would it be better to compare words in the two strings and not letters.

The Split function could be used here with a space as the delimiter.

in the carondelet case it wouldn't work. 3 words match in both cases so which one would get picked?

Code:
Carondelet Holy Cross Hospital                       Holy Cross Hospital 77.27%
Carondelet Holy Cross Hospital Carondelet                  Holy Cross Swing Beds 76.36%
 
Have you tested what I wrote last? The second check gives 40%.

You would also need to manage if a word appears twice. It's somehow managed in the code above (because I editted it) but you would need to do more.
 
Have you tested what I wrote last? The second check gives 40%.

You would also need to manage if a word appears twice. It's somehow managed in the code above (because I editted it) but you would need to do more.

i'm trying to test it and i'm getting 0% for everything, i'm still messing with it
and thank you so much for all your help and effore
 
I had made some changes so maybe you don't have that version. Copy and paste it again.
 
Code:
Expr1	FacilityName	FacName
100.00%	Carondelet Holy Cross Hospital	Carondelet Holy Cross  Swing Beds
100.00%	Lourdes Hospital	Our Lady Of Lourdes Memorial Hospital
100.00%	University Medical Center Brackenridge	University Medical Center at Brackenridge
12.50%	Mount St. Mary’s Hospital and Health Center 	Mt St Marys Hospital
16.67%	St. John Hospital and Medical Center	J J Humes Md And Assoc
20.00%	St. Mary’s Hospital at Amsterdam	St Marys Hospital
25.00%	Mount St. Mary’s Hospital and Health Center 	Mount Saint Marys Hospital
25.00%	St. Mary’s Warrick Hospital	St Marys Warrick
33.33%	Providence Health Center	Providence Healthcare Network
33.33%	Saint Agnes Hospital	St Agnes Healthcare


something is wrong

it's ok, doesn't make sense to mess with it, as i said before, i got it to work with a different function and the method itself doesn't work for me

so thank you for trying!! maybe i can use this method ini the future
 
Such a searching algorithm is what people have spent alot of time researching. This is what google and other search engines have almost perfected and I'm sure if you were to look through the code you will find that it can be rather complicated. A major topic being researched into when talking about meta-data and the relevance of searches.

There are scenarios you would take into consideration, St is not the same as Saint, hence it will not result in a word match. Health is not the same as Healthcare, hence, again it wouldn't result in a match. If you wanted it to match those words then you would need a dictionary of words setup as a multi-dimensional array or a table of possible words so if it sees "St" or "St." it knows it's "Saint".

There would be other scenarios to consider as well - for "Carondelet Holy Cross Swing Beds" and "Our Lady of Lourdes Memorial Hospital" you would need to know the number of words on the left and the number on the right. If the right is bigger AND there are matches, you take out a percentage from the final result.

With your method using a character by character match, one can easily find ways to make the system flawed. All I do is pad-up the search string with one of every possible character that is present in the match string.

Not as easy as you were expecting. ;)
 
Last edited:
Not as easy as you were expecting. ;)

i didn't expect it to be easy at all. i usually post here after i spend a few hours stuck.
as far as St and Saint - i thought about that, also Mount and Mt, and i have some code that replaces them all to St and Mt instead of Saint, St., Mount and Mt..
but can you please explain
dictionary of words setup as a multi-dimensional array or a table of possible words so if it sees "St" or "St." it knows it's "Saint".

i also match by address so i have a table of 300 abbreviations that i'm also replacing, like Street to St, Drive to Dr, Apartment to Apt and so on.
but all i was able to think of is to have this table and a loop that replaces all occurences of a word to what that table says.

is that what you meant in your suggestion or? what's a multi-dimensional array?
and if i have a
table of possible words
how do i make it so that
if it sees "St" or "St." it knows it's "Saint"
?

i'm sorry for taking so much time and effort out of you. i love to learn and you already taught me so much in this topic
 
also, please explain this
With your method using a character by character match, one can easily find ways to make the system flawed. All I do is pad-up the search string with one of every possible character that is present in the match string.

you mean someone can break it on purpose? by padding up the search with one of every possible character? how? and for what?

and like i said before, unfortunately i realized that even though it worked in most of the cases, it's not as reliable as i hoped it'd be. but i can't think of another way to match these hospitals.

in this step the list of hospitals i'm matching is where the address matched to more than one hospital and i need to find a method to automatically find the best match.
any ideas at all?
 
also, i got this match to work, in case i wasn't clear about this. it worked just the way i wanted it to and i was able to get the correct matching %.
But turns out that the idea itself was a bad one, at least for my purposes.
 
To get a good algorithm to get this working would require days of thinking of different scenarios, refining the idea and lots of testing. It's quite a big thing.

as far as St and Saint - i thought about that, also Mount and Mt, and i have some code that replaces them all to St and Mt instead of Saint, St., Mount and Mt..
but can you please explain

i also match by address so i have a table of 300 abbreviations that i'm also replacing, like Street to St, Drive to Dr, Apartment to Apt and so on.
but all i was able to think of is to have this table and a loop that replaces all occurences of a word to what that table says.
It's good to know you've got a table with the abbreviations and its matching word. Is it just a table with two fields, ABBREV | FULL_WORD? For me to be able to explain what I meant it would depend on whether you've done all the replacing? Are there abbreviations present in strMatch and also in strSearch?

is that what you meant in your suggestion or? what's a multi-dimensional array?and if i have a...
I think your table would be sufficient. A multi-dimensional array is an array but in matrix form. There are other objects that would work better.

how do i make it so that ?
Would depend on the response you give from the other questions asked.

i'm sorry for taking so much time and effort out of you. i love to learn and you already taught me so much in this topic
Glad to know you've learnt something.
 
Is it just a table with two fields, ABBREV | FULL_WORD? For me to be able to explain what I meant it would depend on whether you've done all the replacing?

it is and i didn't run the code on the data you're looking at yet, does it matter? i do have something setup though that takes the word from fullword and replaces it with abbrev word.
 

Users who are viewing this thread

Back
Top Bottom