searching with a partial string (2 Viewers)

John Sh

Active member
Local time
Tomorrow, 06:19
Joined
Feb 8, 2021
Messages
603
I have a situation where a number of entries have been mis-spelled and I need to find the correct spelling in a reference list.
As an example, the mis-spelled word is "Dryopteriaceae" and the correct spelling is "Dryopteridaceae".
My idea is to start with the full word and work backwards until a match is found using the code below.
This code fails with "nLen" not being recognised in the "(left(family, nLen))" statement, with or without the outer ().
Is there a way to progressively limit the recordset field such that the search criteria would eventually find a match.
In this case, when both arguments are "Dryopteri" one could reasonably assume the correct match has been found.
I have seen a reference to using "Soundex" for this purpose but even extending it out to 8 characters, it fails to find a matching pair.
At the requisite four characters, it will find numerous faulty matches as it is only sampling the beginning of each word.
Is this possible, or is there a better way to achieve my goal?

Code:
Private Sub Form_Load()
    Dim nLen As integer
    Dim str As String
    Dim rs1 As Recordset
    Set rs1 = oDB.OpenRecordset("taxon", dbOpenSnapshot)
    str = "Dryopteriaceae"
    nLen = Len(str)
    Do
        rs1.FindFirst "(left(family, nLen)) ='" & Left(str, nLen) & "'"
        If Not rs1.NoMatch Then
            Exit Sub
        Else
            nLen = nLen - 1
        End If
    Loop While nLen > 1
End Sub
 
Perhaps
rs1.FindFirst "family like '" & Left(str, nLen) & "*'"
 
I have a couple demos using different techniques.
You can use soundex that allows you to search things that sound alike
But in the demo I found that a Find as you type is really the fastest way.

If I type
Dr it finds anything with DR
if I add DRY then it finds anything with DRY
See these examples
 
In that example there are 10k names in the list. You can play with the Find as You type and that would instantly find your example above.
But the more challenging ones are when you know it something like "Cathy " or "Kathy" or "Cathy, Kathy, Cathie, Kathie, Catherine, Katherine ...." That is where you might want to have additional features like Soundex, Levenshtein, or Simil because with FAYT you will quickly find Ca.... but will not return Kathy and you may miss catherine.

I type in Cathy and it picks up other things that are similar but spelled differently, to include Cathy, Catherine, Kathryn, Kattie


Capture.PNG

Now I do not know if I coded the Soundex/2 wrong but none of the K sounding names appear. I found from my testing the most useful is the Simil function. If you had both a find as you type or the simil to do both types of searches you should be able to find any 'like' strings quickly.
 

Attachments

  • Capture.PNG
    Capture.PNG
    60.4 KB · Views: 14
Last edited:
I have a situation where a number of entries have been mis-spelled and I need to find the correct spelling in a reference list.
As an example, the mis-spelled word is "Dryopteriaceae" and the correct spelling is "Dryopteridaceae".
My idea is to start with the full word and work backwards until a match is found using the code below.
This code fails with "nLen" not being recognised in the "(left(family, nLen))" statement, with or without the outer ().
Is there a way to progressively limit the recordset field such that the search criteria would eventually find a match.
In this case, when both arguments are "Dryopteri" one could reasonably assume the correct match has been found.
I have seen a reference to using "Soundex" for this purpose but even extending it out to 8 characters, it fails to find a matching pair.
At the requisite four characters, it will find numerous faulty matches as it is only sampling the beginning of each word.
Is this possible, or is there a better way to achieve my goal?

Code:
Private Sub Form_Load()
    Dim nLen As integer
    Dim str As String
    Dim rs1 As Recordset
    Set rs1 = oDB.OpenRecordset("taxon", dbOpenSnapshot)
    str = "Dryopteriaceae"
    nLen = Len(str)
    Do
        rs1.FindFirst "(left(family, nLen)) ='" & Left(str, nLen) & "'"
        If Not rs1.NoMatch Then
            Exit Sub
        Else
            nLen = nLen - 1
        End If
    Loop While nLen > 1
End Sub
In this sort of technical case,can you not control the permitted values by using a table of values, including the id of the user who created the value? Then the "genus name" or whatever in the "subject" entity becomes a numeric id, (referencing the permitted values table) and your problem becomes far more tractable.

You could even add a field to confirm the genus name, so the errors should only exist in the unconfirmed genus names. It them takes multiple people to be incompetent to get bad data proliferation.
 
For data entry convenience on an open text field like taxonomy, you can also use a combo that selects distinct values from the taxonomy field itself. That way any taxonomies previously entered are available for quick selection without linking to a sub- or lookup-table.
 
Now I do not know if I coded the Soundex/2 wrong but none of the K sounding names appear.
I don't think Soundex does any phonetic matching, and it is dependent on the first character being correct. At least that is how I remember it.
 

Users who are viewing this thread

Back
Top Bottom