How to get rid of non - alpha-numeric characters? (1 Viewer)

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
Guys, I was wondering if you could help me.
I need to get rid of all non alpha numeric characters in fields.
Do you know any quick way to do it?
Please have in mind I'm not an expert in vb, so was looking for simpler way to do it, unless you could tell me what exactly write and where to put it in:)

I came up with this, but getting syntax errors:(

UPDATE table1 SET string_name=REPLACE(string_name,SUBSTRING(string_name,(PATINDEX('%[^a-zA-Z0-9]%',string_name)),1),' ') WHERE PATINDEX('%[^a-zA-Z0-9]%', string_name)<>0;

Thanks!
 

PaulO

Registered User.
Local time
Today, 20:21
Joined
Oct 9, 2008
Messages
421
Do the non-alphanumeric characters appear in fixed positions within the text?

e.g. always the 5th and 7th characters
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
PATINDEX and SUBSTRING don't exists in Access SQL afaik.

So I think you might need a function. Here's a function that will take a string input and output a Long (number):

Code:
Public Function numOnly(inputString As String) As Long
Dim i As Integer
Dim numString As String

For i = 1 To Len(inputString)
    If Mid(inputString, i, 1) Like "[0-9]" Then
        numString = numString & Mid(inputString, i, 1)
    End If
Next i

numOnly = Val(numString)

End Function


But your use of PATINDEX gave me an idea. You could use the Vbscript regexp object to do some of the work.So here's a different function that will do the same job:

Code:
Public Function numOnly2(inputString As String) As Long
Dim RE As Object

Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "[^0-9]"
RE.Global = True
numOnly2 = Val(RE.Replace(inputString, ""))

Set RE = Nothing

End Function
 

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
PATINDEX and SUBSTRING don't exists in Access SQL afaik.

So I think you might need a function. Here's a function that will take a string input and output a Long (number):

Code:
Public Function numOnly(inputString As String) As Long
Dim i As Integer
Dim numString As String

For i = 1 To Len(inputString)
    If Mid(inputString, i, 1) Like "[0-9]" Then
        numString = numString & Mid(inputString, i, 1)
    End If
Next i

numOnly = Val(numString)

End Function
But your use of PATINDEX gave me an idea. You could use the Vbscript regexp object to do some of the work.So here's a different function that will do the same job:

Code:
Public Function numOnly2(inputString As String) As Long
Dim RE As Object

Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "[^0-9]"
RE.Global = True
numOnly2 = Val(RE.Replace(inputString, ""))

Set RE = Nothing

End Function

Thanks, let me try it out.

Now, can you please tell me where should I copy this VB function and how to run it? I know nothing about it:(
 

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
Take a look at this tutorial.

Chris

Thanks.
OK, so I tried your function, but it return digits from the string only.
What need is to find any non alphanumeric character in the field and delete it, returning rest of the characters, so for example in filed Name I need to get rid of all spaces, brackets, etc: for "BCD's Exchange 44 (UK)" function will return "BCDsExchangeUK".

How can I do this?
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
Sorry, I thought that was the objective :eek:

Which function are you using out of interest? In any case, here's two functions to return only characters (either should work):

Code:
Public Function charOnly(inputString As String) As String
Dim i As Integer
Dim numString As String

For i = 1 To Len(inputString)
    If Mid(inputString, i, 1) Like "[A-Z,a-z]" Then
        numString = numString & Mid(inputString, i, 1)
    End If
Next i

charOnly = numString

End Function

Code:
Public Function charOnly2(inputString As String) As String
Dim RE As Object

Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "[^A-Z,^a-z]"
RE.Global = True
charOnly2 = RE.Replace(inputString, "")

Set RE = Nothing

End Function

But you said non-alphanumeric. Your example seems to have got rid of the numeric which is what my code does. But you can easily change the pattern to include 0-9

Chris
 

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
Sorry, I thought that was the objective :eek:

...
But you said non-alphanumeric. Your example seems to have got rid of the numeric which is what my code does. But you can easily change the pattern to include 0-9

Chris

Sorry for confusion, I was after getting rid of all non alpha numeric characters.
First one works perfectly! Thank you so much for help!

If I can only ask you for one more thing: I wanted to have two diffrent options, on that leaves all alphanumeric characters only and second one that also leaves spaces between strings. So I addedd space there, but now, where there was a single non-alphanumeric string I get double spaces. How can I fix it?

For example for field "21 Hair & Beauty (Uk)" I get "21 Hair Beauty Uk"
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
The following seems to work:

Code:
Public Function charOnly(inputString As String) As String
Dim i As Integer
Dim numString As String

For i = 1 To Len(inputString)
    If Mid(inputString, i, 1) Like "[A-Z,0-9,' ']" Then
        numString = numString & Mid(inputString, i, 1)
    End If
Next i

charOnly = Replace(numString, "  ", " ")

End Function

In other words, the last task is to replace any double spaces with single spaces.

I suppose another way would be to create a more complex pattern that also picked up space/special char/space. But it seems tedious that way.

By the way, I just read the help on Like and it seems you don't need A-Z and a-z because Access treats them as the same. Hence the above should do.

Chris
 

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
The following seems to work:

Code:
Public Function charOnly(inputString As String) As String
Dim i As Integer
Dim numString As String

For i = 1 To Len(inputString)
    If Mid(inputString, i, 1) Like "[A-Z,0-9,' ']" Then
        numString = numString & Mid(inputString, i, 1)
    End If
Next i

charOnly = Replace(numString, "  ", " ")

End Function
In other words, the last task is to replace any double spaces with single spaces.

I suppose another way would be to create a more complex pattern that also picked up space/special char/space. But it seems tedious that way.

By the way, I just read the help on Like and it seems you don't need A-Z and a-z because Access treats them as the same. Hence the above should do.

Chris

Yep, you're right, those are treated the same.
This works great!

Now, I have a last question, I promise!
How can I get rid of all strings in brackets before I run my function above?
I noticed that sometimes I have some not important data in brackets, like that example above has UK. How can I delete those whole strings from the field?

For that "21 Hair & Beauty (Uk)" function firstly would cut out (UK) then clean up the field and return "21 Hair Beauty" .

Can I just add this line on top of the function?

charOnly = Replace(numString, "(*) ", "")
 

vbaInet

AWF VIP
Local time
Today, 20:21
Joined
Jan 22, 2010
Messages
26,374
If you go back to Chris' regex, the following should do:

1. Get rid of non-alphanumeric characters:
Code:
    .... code here
    RE.Pattern = "\W"
    ... code here
    FunctionName =  RE.Replace(inputString, "")
    ... code here
2. Get rid of brackets and its content plus leading or trailing spaces:
Code:
    ... code here
    RE.Pattern = "(\(.*?\))|(\s\(.*?\))|(\(.*?\)\s)|(\s\(.*?\)\s)"
    ... code here
    FunctionName =  RE.Replace(inputString, "")
    ... code here
Where FunctionName is the name of your function. Where it says "... code here", insert the lines Chris gave.
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
If you go back to Chris' regex, the following should do:

1. Get rid of non-alphanumeric characters:
Code:
    .... code here
    RE.Pattern = "\W"
    ... code here
    FunctionName =  RE.Replace(inputString, "")
    ... code here
2. Get rid of brackets and its content plus leading or trailing spaces:
Code:
    ... code here
    RE.Pattern = "(\(.*?\))|(\s\(.*?\))|(\(.*?\)\s)|(\s\(.*?\)\s)"
    ... code here
    FunctionName =  RE.Replace(inputString, "")
    ... code here
Where FunctionName is the name of your function. Where it says "... code here", insert the lines Chris gave.
Nice stuff.

A couple of comments...

\W also excludes underscore which might be undersirable

Code:
RE.Pattern = "(\(.*?\))|(\s\(.*?\))|(\(.*?\)\s)|(\s\(.*?\)\s)"

Not sure if ? in *? has any effect. Anyway, what about:

Code:
RE.Pattern = "\s?\(.*\)\s?"

Chris
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
We have a slight problem if there are two pairs of brackets. Our previous expression will only find the first outer pair of brackets. For example:

Some text (kljlkj) and some more text (xxx) the end.

Becomes:

Some textthe end.

However, the following deals with the bracket pairs independently (I think):

Code:
RE.Pattern = "\s?\([^(]*\)\s?"

Chris
 

vbaInet

AWF VIP
Local time
Today, 20:21
Joined
Jan 22, 2010
Messages
26,374
Ah, I didn't know that about \W Chris. That must be a bug because other scripting languages match the underscore. In that case the OP can use \W|_

The ? after .* is called stingy matching. I don't know if the term exists in VB Script but the principle still applies.

Test this string without the question mark:

"The big bro(wn) fox (jumped) o(344)ver the lazy dog"
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
The ? after .* is called stingy matching. I don't know if the term exists in VB Script but the principle still applies.
Ah got it.

Try this:

"The (b)ig bro(wn) fox (jumped) o(344)ver the lazy dog"

Gives:

Theig bro fox over the lazy dog

The problem is the patterns are tested in order so when we change the order we face the problem with spaces. Any ideas?

Chris
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
"The (b)ig bro(wn) fox (jumped) o(344)ver the lazy dog"

Gives:

Theig bro fox over the lazy dog

This seems to work:

Code:
RE.Pattern = "\b\([^(]*?\)|\([^(]*?\)\b|\([^(]*?\)\s"

In case anyone is interested, the RegExp syntax for VBscript is here.
 

phinix

Registered User.
Local time
Today, 20:21
Joined
Jun 17, 2010
Messages
130
This seems to work:

Code:
RE.Pattern = "\b\([^(]*?\)|\([^(]*?\)\b|\([^(]*?\)\s"
In case anyone is interested, the RegExp syntax for VBscript is here.

Ha! You see, before you out this link to syntax list, I was like :eek:
Didn't know what those characters mean:)
Now I totally understand this RegExp.

Thank you guys for this!
 

stopher

AWF VIP
Local time
Today, 20:21
Joined
Feb 1, 2006
Messages
2,396
Sorry for lack of explanation. Regular expressions are used in many coding environments see here. My limited experience comes from XML Schemas that are used to validate XML documents.

VBA does not do regular expressions so we had to borrow the RegExp class from the VBscript library (part of Windows installation).

You can do all your validation in one expression. That’s the idea really.

The classic validation that always sticks in my mind is the UK post code validation.

Chris
 

vbaInet

AWF VIP
Local time
Today, 20:21
Joined
Jan 22, 2010
Messages
26,374
Regex is fun isn't it ;)

Chris, great work there. I was testing your piece of regex and it worked until I used,
" (123) The (b)big brown(wn) fox jumped (jumped) o(344)ver the lazy dog (123)"

I've not had much time to give this much thought but here's what Iquickly mocked up:
Code:
    ' match start, end and brackets within boundaries
    RE.Pattern = "(^\s?\([^(]*?\)\s?)|(\s?\([^)]*?\)\s?$)|(\([^)]*?\)\b)|(\b\([^)]*?\))"
    inputString = RE.Replace(inputString, "")
        
    ' match others
    RE.Pattern = "\s\([^(]*?\)"
    FUNCTION_NAME = RE.Replace(inputString, "")
I'm sure you can refine the regex to make it shorter.
 

LPurvis

AWF VIP
Local time
Today, 20:21
Joined
Jun 16, 2008
Messages
1,269
Hi.

Late to the party here, so not sure where we're at. Are we removing any text that is non alphanumeric and also anything delimited by brackets?

RE.Pattern = "\([^)]+\)|[^\w\s]"

I might have missed something - or it's already been sorted...

Cheers.
 

Users who are viewing this thread

Top Bottom