Solved Regular Expression to detect end of word (1 Viewer)

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
I'm splitting lines at spaces to find words to test. However, other characters delimit words too, such as () and ,

Can a regular expression be used to parse out the words and tell me the terminator character(s) of each?

examples:
word1(word2,
word1(word2)
word1)
word1),

thank you!
 
Last edited:

Uncle Gizmo

Nifty Access Guy
Staff member
Local time
Today, 01:45
Joined
Jul 9, 2003
Messages
16,280

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
I think so...

I have managed to avoid the need for regular expressions so I know very little about them. However a Google turned up this nugget of information which I believe might be relevant "word boundaries" See:- https://www.regular-expressions.info/wordboundaries.html

thanks, Uncle Gizmo. Now I know how newbies to Access feel when they want you do do it for them. I've tried to learn about regular expressions before, and badly want to! But my head gets dizzy looking at all the possibilities.

If anyone can make an example, I would very much appreciate it.

My alternative, and the way I did it before, was to replace these characters with spaces around them (and then take them out again later). That's an easy way to do it though, and probably not as good performance.
 
Last edited:

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 20:45
Joined
May 21, 2018
Messages
8,527
I am no expert, but I would say definitely yes. I assume you need a pattern of only letters ending in a non letter character excluding numbers. I would pm @arnelgp . He seems to be the guru on regexp. I would have to play with it too long. There are some great resources on line where you can test your pattern.
 

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 17:45
Joined
Oct 29, 2018
Messages
21,467
Hi Crystal. Just to clarify, you want to know the word terminator, not the word? I think I could get the words, but that's not what you want?
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 20:45
Joined
May 21, 2018
Messages
8,527
I would build a function that wraps the match collection and returns the word. The match collection should return
word1(word2,
word3(word4)
word5)
word6)
Word1(
word2,
word3(
word4)
word5)
word6)

Then you could wrap it in a function returning the clean word and the delimeter. For example I use this to find all acronyms in a word document but then wrap this in a function the returns the occurrences in alphabetical order and the count or each occurence.
 

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
thanks Maj, theDBguy, and Gasman
I'm modifying my Color Comments Green program to do keywords in blue, so need to know the word to test as well as how it ends -- or maybe I don't need to know the end. 2 ways to do it: (1) build a string as I go through each word at a time, which is my preference so I don't have tags around 2 words in a row, but one tag around them both or (2) replace word with tag+word+tag ... but then I need to eliminate extra tags.

I do have a program I use for myself, for my website, but haven't shared it because its full of duct tape!
 

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 17:45
Joined
Oct 29, 2018
Messages
21,467
thanks Maj, theDBguy, and Gasman
I'm modifying my Color Comments Green program to do keywords in blue, so need to know the word to test as well as how it ends -- or maybe I don't need to know the end. 2 ways to do it: (1) build a string as I go through each word at a time, which is my preference so I don't have tags around 2 words in a row, but one tag around them both or (2) replace word with tag+word+tag ... but then I need to eliminate extra tags.

I do have a program I use for myself, for my website, but haven't shared it because its full of duct tape!
Hi Crystal. Not sure I fully understand the intent, but I used this function with the following pattern: "\b[a-z09]+\b" and got all the words out of your sample data: "word1;word2;word1;word2;word1;word1"
 

jamesave

New member
Local time
Yesterday, 17:45
Joined
Apr 26, 2020
Messages
16
Hi strive4peace! (just a note: I truly enjoy your youtube videos)

Now, going to your question.. I am pretty sure you can. I haven't played with regex for quite a while now, but what you are looking for is \w+ should work in your case.

Gasman linked to the website, I pasted your sample and it matches the six words, including the numbers.
 

Isaac

Lifelong Learner
Local time
Yesterday, 17:45
Joined
Mar 14, 2017
Messages
8,777
Do you really need a regular expression? If you turn the words into an array by Split()-ing a string using a space as a delimiter, just test for the last character in the string.
 

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 17:45
Joined
Oct 29, 2018
Messages
21,467
Hi strive4peace! (just a note: I truly enjoy your youtube videos)

Now, going to your question.. I am pretty sure you can. I haven't played with regex for quite a while now, but what you are looking for is \w+ should work in your case.

Gasman linked to the website, I pasted your sample and it matches the six words, including the numbers.
I agree. I changed my pattern to "\b\w+\b" and still got the same result as before. Cheers!
 
Last edited:

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 17:45
Joined
Oct 29, 2018
Messages
21,467
Do you really need a regular expression? If you turn the words into an array by Split()-ing a string using a space as a delimiter, just test for the last character in the string.
In the above given sample data, there are no space characters to use for the Split() function. I think the issue was not knowing ahead of time what terminates a word (i.e. not always a space).
 

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
Do you really need a regular expression? If you turn the words into an array by Split()-ing a string using a space as a delimiter, just test for the last character in the string.
thanks for the thought, but it isn't always the last character. this for example:
MyFunctionName(True)
 

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
thanks, theDBguy. Ideally, I would get words and the terminator character(s) so the result can be constructed sequentially without using search/replace. For instance, here, I wouldn't want the tag to end blue after As and then the tag to start blue before String:
Dim myVariable As String
Well if I get them, I then need to eliminate the extra tags. With my new way, I track whether I'm in a color tag or not -- and close it if the next word isn't colored, or just write the next word if it is still a color. However, ( ) and , always end a color
 

HalloweenWeed

Member
Local time
Yesterday, 20:45
Joined
Apr 8, 2020
Messages
213
Hello s4p, here is my solution, it looks for other letters (alpha) before and after the string:

Code:
Public Function FindCompleteWordInString(SearchStr As Variant, MatchStr As Variant, Optional Start As Long) As Long
'....................................................................
' Author: HalloweenWeed
' Date:     2/4/2020
' Looks for a match inside a string of characters, only returns positive
' Long integer if the match is a full word inside the string.
' Optional Start As Long sets the first character to start the search at,
' if not used then the search starts at the beginning;
' otherwise if less than 1 or more than the length of SearchStr
' search begins at the first character in SearchStr.
' Positive Long integer returned is the Index of
' the first whole word match found within SearchStr.
' MatchStr is the string looked for, inside SearchStr.
' If an alphanumeric character is immediately on either side of
' the match word, then it is ignored (thus only whole-word matches).
' The match word must match a portion of the string exactly except case is ignored.
' Leading and trailing punctuation in the string to be searched is ignored.
' Also works if the match term is at the very beginning or end of the string.
' If either string is Null or zero-length then -1 is returned.
' Limitation: SearchStr must be < 2,147,483,648 characters in length.
'....................................................................

'On Error GoTo Err_Handler

Dim Index As Variant, PrevIndex As Variant
Dim LeadChrUsed As Boolean, TrailChrUsed As Boolean
Dim WordFound As Boolean
Dim LeadChr As String, TrailChr As String
Dim StrLen As Long

    StrLen = 0
    Index = 0
    LeadChr = vbNullString
    TrailChr = vbNullString
    WordFound = False
    LeadChrUsed = False
    TrailChrUsed = False
    If IsMissing(Start) Then
        PrevIndex = 1
    Else
        If Start < 1 Or Start > Len(SearchStr) Then
            PrevIndex = 1
        Else
            PrevIndex = Start
        End If
    End If

    FindCompleteWordInString = -1                       'default
    If VarType(SearchStr) <> vbString Then GoTo ExitFunction
    If VarType(MatchStr) <> vbString Then GoTo ExitFunction
    If Len(SearchStr) < 1 Then GoTo ExitFunction
    If Len(MatchStr) < 1 Then GoTo ExitFunction


    StrLen = Len(MatchStr)
    Do While Index < Len(SearchStr) And PrevIndex < Len(SearchStr) And Not WordFound

        Index = InStr(PrevIndex, SearchStr, MatchStr, vbTextCompare)
        If Not IsNull(Index) Then
            If Index > 0 Then           'if we have a possible match then check it
                If Index > 1 Then       'if match is on the very left side of string,
                                        'then no lead character is applicable
                    LeadChr = Mid(SearchStr, Index - 1, 1)
                    LeadChrCint = Asc(LeadChr)
                    LeadChrUsed = True
                Else
                    LeadChr = vbNullString
                    LeadChrCint = 0
                    LeadChrUsed = False
                End If
                If Index < Len(SearchStr) - StrLen Then 'if match is on the very right side
                                        'of string, then no trailing character is applicable
                    TrailChr = Mid(SearchStr, Index + StrLen, 1)
                    TrailChrCint = Asc(TrailChr)
                    TrailChrUsed = True
                Else
                    TrailChr = vbNullString
                    TrailChrCint = 0
                    TrailChrUsed = False
                End If
                'look for letters that indicate word embedded inside another word
                If LeadChrUsed Then
                    If LeadChrCint < 65 Or (LeadChrCint > 90 And LeadChrCint < 97) Or _
                                LeadChrCint > 122 Then
                        LeadChrUsed = False
                    End If
                End If
                If TrailChrUsed Then
                    If TrailChrCint < 65 Or (TrailChrCint > 90 And TrailChrCint < 97) Or _
                                TrailChrCint > 122 Then
                        TrailChrUsed = False
                    End If
                End If
                'if no alphanumeric lead or trailing character found, then we matched full word
                If (Not LeadChrUsed) And (Not TrailChrUsed) Then
                                            'found complete independant word match
                    WordFound = True
                    Exit Do
                End If
            Else
                Exit Do
            End If
        Else
            Exit Do
        End If

        If WordFound Then Exit Do
        If Index > Len(SearchStr) Then Exit Do
        PrevIndex = Index + StrLen  'start the next loop searching after the aborted find
        Index = 0
        LeadChrUsed = False
        TrailChrUsed = False
        LeadChr = vbNullString
        TrailChr = vbNullString
        LeadChrCint = 0
        TrailChrCint = 0

    Loop

    If Not WordFound Then Index = 0
    FindCompleteWordInString = Index                        'match found here (if not zero)

ExitFunction:
    Exit Function

Err_Handler:
    MsgBox ("Error #" & Err.Number & ": " & Err.Description)
    Resume ExitFunction

End Function
 

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
Hello s4p, here is my solution, it looks for other letters (alpha) before and after the string:
Code:
Public Function FindCompleteWordInString(SearchStr As Variant, MatchStr As Variant, Optional Start As Long) As Long

thanks, HalloweenWeed, but there are about 175 keywords, which is why I'm pulling words to see if they are one rather than looking for each keyword in the line
 

strive4peace

AWF VIP
Local time
Yesterday, 19:45
Joined
Apr 3, 2020
Messages
1,004
Do you really need a regular expression? If you turn the words into an array by Split()-ing a string using a space as a delimiter, just test for the last character in the string.

thanks, Isaac. I don't know! First I do split at space, but the parts have multiple words sometimes too, as with the examples in my first post.

I've already written this in a kludgey way, I was hoping to make it a bit more elegant!
 

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 17:45
Joined
Oct 29, 2018
Messages
21,467
thanks, HalloweenWeed, but there are about 175 keywords, which is why I'm pulling words to see if they are one rather than looking for each keyword in the line
Hi Crystal. Did any of the regex sample give above help, or did you decide to go another way? For example, here's the result when I tried it; but like I said earlier, this just pulls all the words without the terminators.

regex.png


The above image shows I copied and pasted some words from your original post, clicked the button, and then the MsgBox showed the words from it.
 

Users who are viewing this thread

Top Bottom