Regular expressions: how to find several matches?

Leen

Registered User.
Local time
Today, 07:10
Joined
Mar 15, 2007
Messages
56
Hi,

if I have a value of a record, example: "LU_ATTRIB:LUTYPE:A1_VALUES"

My goal is, using regular expressions, to come to 3 variables comming from that record, namely: LU_ATTRIB, LUTPE and A1_VALUES in the variables field1, field2, field3.

I tried the following code:
Code:
Function regexpparts(inputfield As String) As String '
    Dim matches, field, reffield, number
    Dim regEx As Object
    Set regEx = CreateObject("vbscript.regexp")
    
        With regEx
        .Pattern = "([A-Za-z0-9_]*)"
        .Global = False
    End With
    
    Set matches = regEx.Execute(inputfield)
    number = matches.Count
    field = matches(0)
    field2 = matches(1)
    field3 = matches(2)
    
    Set matches = Nothing
    Set regEx = Nothing
End Function

However, when I run this code it bugs on the line "field2 = matches(1)".

By checking the "number" of matches using an addwach-method, it returns the value 1. But why? Because my regular expressions pattern:
"([A-Za-z0-9_]*)" if looking for any possible matches for wordcharacter (inculding underscore)

So for "field", the code finds the match " LU_ATTRIB". However, for field1 and field2 this doesn't work.

If I understand the principal of regular matches, I don't think I have to work with submatches?

Somebody can help?
Thanks a lot on forehand!
 
Leen,

Do a Search here for the "Split" function. You can use the ":" as a delimiter.

Wayne
 
Global = False - only first match. Global = True, for more matches, but then you'll find the pattern looks for zero or more occurances of the characters, and that it's greedy and you'd need lazy...

Mayhaps

.Pattern = "(\b\w+?\b)"

(btw \w equals [A-Za-z0-9_])

If what you've posted as sample is accurate, a regular split, as suggested by WayneRyan, will probably prove to be faster, unless the field values are a bit longer.

Would it be possible for you to normalize this db?
 
Hi!

Thanks for all help. I used your pattern and it works!! Just one question: in the help of regular expressions, I found concerning your pattern:

.Pattern = "(\b\w+?\b)"

\b = Matches a word boundary, that is, the position between a word and a space. For example, 'er\b' matches the 'er' in "never" but not the 'er' in "verb"
\w = Matches any word character including underscore. Equivalent to '[A-Za-z0-9_]'.
+ = Matches the preceding subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
? = When this character immediately follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. For example, in the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's.

However, I do'nt understand how this match can come to this conclusion?
Why not for example:

.Pattern = "(\b\w*\b)"

Thanks!

What exactly do you mean by "normalizing the db"?

ps: I did not use the "split" function as I'd like to know how to work with the regular expressions, but thanks for the help!
 
Sorry, greedy/lazy probably doesn't come into play at all due to using word boundaries. I was playing with different suggestions, and forgot.

"\b\w+\b" would probably do.

Using * over + - with * (zero or more occurances) it will possibly find word boundaries whith zero /w characters between them (try it out - on my setup, it's giving three extra matches with your sample, though I'm not entirely sure why).

On greedy/lazy, here's another sample, say you're working with html, and wish to return characters between <b> and </b> tags.

You have the following text

This is just <b>testing</b> some <b>tagging</b>

The following pattern

<[bB]>.*</[bB]>

would return

"<b>testing</b> some <b>tagging</b>"

as it matches as much of the string as possible - i e from the opening "<b>" to the closing "</b>", returning all characters between them, and the pattern

<[bB]>.*?</[bB]>

would return two distinct matches

"<b>testing</b>"
"<b>tagging</b>"

Normalizing - you say you have a record with this information, if this is a table field, it should probably be divided into separeate fields from the start, to avoid having to do stuff like this.
 
Hi,

actually I tried many pattern and had many different results. Sometimes way to many. But your initial patterns seems to do it!

Actually, for the application I'm creating I really need to put everything into one record, into one field, so I can not normalize in this case.

Thanks for your explanation!
 

Users who are viewing this thread

Back
Top Bottom