Solved searching with a partial string

John Sh · Jun 8, 2025

I have a situation where a number of entries have been mis-spelled and I need to find the correct spelling in a reference list.
As an example, the mis-spelled word is "Dryopteriaceae" and the correct spelling is "Dryopteridaceae".
My idea is to start with the full word and work backwards until a match is found using the code below.
This code fails with "nLen" not being recognised in the "(left(family, nLen))" statement, with or without the outer ().
Is there a way to progressively limit the recordset field such that the search criteria would eventually find a match.
In this case, when both arguments are "Dryopteri" one could reasonably assume the correct match has been found.
I have seen a reference to using "Soundex" for this purpose but even extending it out to 8 characters, it fails to find a matching pair.
At the requisite four characters, it will find numerous faulty matches as it is only sampling the beginning of each word.
Is this possible, or is there a better way to achieve my goal?

Code:

Private Sub Form_Load()
    Dim nLen As integer
    Dim str As String
    Dim rs1 As Recordset
    Set rs1 = oDB.OpenRecordset("taxon", dbOpenSnapshot)
    str = "Dryopteriaceae"
    nLen = Len(str)
    Do
        rs1.FindFirst "(left(family, nLen)) ='" & Left(str, nLen) & "'"
        If Not rs1.NoMatch Then
            Exit Sub
        Else
            nLen = nLen - 1
        End If
    Loop While nLen > 1
End Sub

CJ_London · Jun 8, 2025

Perhaps
rs1.FindFirst "family like '" & Left(str, nLen) & "*'"

MajP · Jun 8, 2025

I have a couple demos using different techniques.
You can use soundex that allows you to search things that sound alike
But in the demo I found that a Find as you type is really the fastest way.

If I type
Dr it finds anything with DR
if I add DRY then it finds anything with DRY
See these examples

Matching logic

Probably not difficult but I would advise against it. In case where matches aren't exact, I would want someone to check the results rather than trust any algorithm I have a system for checking a batch of online payments by parents against the reference number for their child so it can be...

www.access-programmers.co.uk

MajP · Jun 8, 2025

In that example there are 10k names in the list. You can play with the Find as You type and that would instantly find your example above.
But the more challenging ones are when you know it something like "Cathy " or "Kathy" or "Cathy, Kathy, Cathie, Kathie, Catherine, Katherine ...." That is where you might want to have additional features like Soundex, Levenshtein, or Simil because with FAYT you will quickly find Ca.... but will not return Kathy and you may miss catherine.

I type in Cathy and it picks up other things that are similar but spelled differently, to include Cathy, Catherine, Kathryn, Kattie

Now I do not know if I coded the Soundex/2 wrong but none of the K sounding names appear. I found from my testing the most useful is the Simil function. If you had both a find as you type or the simil to do both types of searches you should be able to find any 'like' strings quickly.

gemma-the-husky · Jun 8, 2025

John Sh said:
I have a situation where a number of entries have been mis-spelled and I need to find the correct spelling in a reference list.
As an example, the mis-spelled word is "Dryopteriaceae" and the correct spelling is "Dryopteridaceae".
My idea is to start with the full word and work backwards until a match is found using the code below.
This code fails with "nLen" not being recognised in the "(left(family, nLen))" statement, with or without the outer ().
Is there a way to progressively limit the recordset field such that the search criteria would eventually find a match.
In this case, when both arguments are "Dryopteri" one could reasonably assume the correct match has been found.
I have seen a reference to using "Soundex" for this purpose but even extending it out to 8 characters, it fails to find a matching pair.
At the requisite four characters, it will find numerous faulty matches as it is only sampling the beginning of each word.
Is this possible, or is there a better way to achieve my goal?

Code:

Private Sub Form_Load() Dim nLen As integer Dim str As String Dim rs1 As Recordset Set rs1 = oDB.OpenRecordset("taxon", dbOpenSnapshot) str = "Dryopteriaceae" nLen = Len(str) Do rs1.FindFirst "(left(family, nLen)) ='" & Left(str, nLen) & "'" If Not rs1.NoMatch Then Exit Sub Else nLen = nLen - 1 End If Loop While nLen > 1 End Sub

In this sort of technical case,can you not control the permitted values by using a table of values, including the id of the user who created the value? Then the "genus name" or whatever in the "subject" entity becomes a numeric id, (referencing the permitted values table) and your problem becomes far more tractable.

You could even add a field to confirm the genus name, so the errors should only exist in the unconfirmed genus names. It them takes multiple people to be incompetent to get bad data proliferation.

MarkK · Jun 8, 2025

For data entry convenience on an open text field like taxonomy, you can also use a combo that selects distinct values from the taxonomy field itself. That way any taxonomies previously entered are available for quick selection without linking to a sub- or lookup-table.

MarkK · Jun 8, 2025

MajP said:
Now I do not know if I coded the Soundex/2 wrong but none of the K sounding names appear.

I don't think Soundex does any phonetic matching, and it is dependent on the first character being correct. At least that is how I remember it.

gemma-the-husky · Jun 8, 2025

I expect soundex would allow duplicates, because they sound correct. You want a way to prevent typos.

When you let users type text you are going to get loads of errors.

Something like 7mm will be entered as 7mm, 7 mm, 7m.m. and loads of other things. It's very hard to make anything foolproof.

John Sh · Jun 9, 2025

Thanks to all.
I will try "like" and see how it goes.
I haven't heard of "Simil" but will have a look at it and MajP's demo.
The database I am working on was created in the 1980s and had nothing like a combo box so it was a case of whatever was entered was accepted.
I am now cleaning up that mess.
All critical entries are now from a combobox and additions can only be approved by a supervisor.

My original idea of reducing the size of target and search strings was doomed to failure if the two strings were of different lengths, in that case there could never be a match.

John

MajP · Jun 9, 2025

I am now cleaning up that mess

I think, if I was doing a huge clean up I would pick one of those algorithms like simil.

Then loop all your records and return and save all like values that meet a certain score. Maybe anything above a .65 if using Simil or 70 using Leven... These are then "pretty similar" records. Or maybe set it to return the top 10 most similar records.

So you would have a large table with maybe 5-10 child records either way.

Then build a user interface that shows a value and all similar found records. Have a way to quickly select a "master" if one exist and select any records that should have its value swapped with the master. This will still be time consuming, but at least it is a doable strategy. For each record you review you then need status "Master", "Swapped", "Reviewed", default is "Not Reviewed". This way you know what records have been reviewed and if anything was done.

I agree with everyone there are ways to ensure data entry of unique values, but I do not think that is the problem. The problem is fixing what exists.\

Also sort your solutions from highest found score. Work records that have a Simil score of .99 is likely to be a misspelling.

tvanstiphout · Jun 9, 2025

Simil is here: https://accessmvp.com/TomVanStiphout/Simil.htm

John Sh · Jun 9, 2025

The "findfirst" statement that works

CJ_London said:
Perhaps
rs1.FindFirst "family like '" & Left(str, nLen) & "*'"

The code that finally worked is a slight variation on your suggestion,

Code:

rs1.FindFirst "[family] like '" & "*" & sStr & "*'"

Where sStr is the result of left(str,nlen)
Many thanks.
John

John Sh · Jun 9, 2025

MajP said:
I think, if I was doing a huge clean up I would pick one of those algorithms like simil.

Then loop all your records and return and save all like values that meet a certain score. Maybe anything above a .65 if using Simil or 70 using Leven... These are then "pretty similar" records. Or maybe set it to return the top 10 most similar records.

So you would have a large table with maybe 5-10 child records either way.

Then build a user interface that shows a value and all similar found records. Have a way to quickly select a "master" if one exist and select any records that should have its value swapped with the master. This will still be time consuming, but at least it is a doable strategy. For each record you review you then need status "Master", "Swapped", "Reviewed", default is "Not Reviewed". This way you know what records have been reviewed and if anything was done.

I agree with everyone there are ways to ensure data entry of unique values, but I do not think that is the problem. The problem is fixing what exists.\

Also sort your solutions from highest found score. Work records that have a Simil score of .99 is likely to be a misspelling.

I have created a form that has all the inherent errors in the main table with links to spelling corrections, duplicate entries, family/genus mismatches and a host of other problems. The idea is to bring all these errors into a single hub where the supervisors can work their way through.
Many of the errors require people with botanical training to correct so a blanket fix is not really possible.
Simple spelling errors or specimen location errors etc are corrected in bulk from within the form.
Given the age of the database, it was never normalised.
Data entry was on the raw table, scanning over some 90 fields which were all text fields, including dates.
I have had Access analyze the data but it virtually gave up on creating a normalised result.
Maybe I can get AI to have a look at it!

MajP · Jun 9, 2025

Like Function

This problem is not soluble. (Stay with me, that was rhetorical.) If your text box contains 292 72 0276 then do you want to match all records with 292? With 72? With 0276? You already said there is no pattern. The computer cannot be a mind reader. (Which is good because by now mine would...

www.access-programmers.co.uk

gemma-the-husky · Jun 9, 2025

John Sh said:
I have created a form that has all the inherent errors in the main table with links to spelling corrections, duplicate entries, family/genus mismatches and a host of other problems. The idea is to bring all these errors into a single hub where the supervisors can work their way through.
Many of the errors require people with botanical training to correct so a blanket fix is not really possible.
Simple spelling errors or specimen location errors etc are corrected in bulk from within the form.
Given the age of the database, it was never normalised.
Data entry was on the raw table, scanning over some 90 fields which were all text fields, including dates.
I have had Access analyze the data but it virtually gave up on creating a normalised result.
Maybe I can get AI to have a look at it!

In general cleaning large amounts of dubious data is not going to be easy..

If it's an ongoing issue, then I still would explore considering a table based control system, as you could freeze the current problems and work on them without continuing data errors complicating things.

John Sh · Jun 9, 2025

gemma-the-husky said:
In general cleaning large amounts of dubious data is not going to be easy..

The existing data errors are confined to the older entries and do not affect current data input, so it is mainly a case of cleaning up as time permits.
All new critical data input is controlled by dropdown lists and while it is still possible to select the wrong item in a list, any item selected will be somewhat relevant to the specimen in question. the errors will, at least, be restricted to the same family and not be akin to "32-2 2025"
Thank you for your interest.
John

Solved searching with a partial string

John Sh

Active member

CJ_London

Super Moderator

MajP

You've got your good things, and you've got mine.

Matching logic

MajP

You've got your good things, and you've got mine.

Attachments

gemma-the-husky

Super Moderator

MarkK

bit cruncher

MarkK

bit cruncher

gemma-the-husky

Super Moderator

John Sh

Active member

MajP

You've got your good things, and you've got mine.

tvanstiphout

Active member

John Sh

Active member

John Sh

Active member

MajP

You've got your good things, and you've got mine.

Like Function

gemma-the-husky

Super Moderator

John Sh

Active member

Similar threads

Users who are viewing this thread