Find and Replace Accented Characters (1 Viewer)

LanaR

Member
Local time
Tomorrow, 10:16
Joined
May 20, 2021
Messages
113
If you have imported data that has originated in a non-English-speaking country, you will find that words containing accented characters might not get picked up in searches or that filters fail. This function can be used to search a string and replace such characters with standard Latin characters.

Code:
Function FindRem(FldName As String) As Variant

'Define various variables that will be required for this process

    Dim k As Integer                                'variable to be used as counters
    Dim OrgStr, LetterCheck, NewStr As String       'String variables for manipulation of string to be searched
    
                
            OrgStr = FldName                        'Set string to search
            
            For k = 1 To Len(OrgStr)                'Cycle through characters in string and check for accented characters
                LetterCheck = Mid(OrgStr, k, 1)
            
                    Select Case Asc(LetterCheck)
                    
                            Case 131        '"ƒ"
                                LetterCheck = "f"
                            Case 138        '"Š"
                                LetterCheck = "S"
                            Case 154        '"š"
                                LetterCheck = "s"
                            Case 140        '"Œ"
                                LetterCheck = "OE"
                            Case 156        '"œ"
                                LetterCheck = "oe"
                            Case 142        '"Ž"
                                LetterCheck = "Z"
                            Case 158        '"ž"
                                LetterCheck = "z"
                            Case 159        '"Ÿ"
                                LetterCheck = "y"
                            Case 192 To 197 '"À, Á, Â, Ã, Ä, Å"
                                LetterCheck = "A"
                            Case 224 To 229 '"lowercase version of the above"
                                LetterCheck = "a"
                            Case 198        '"Æ"
                                LetterCheck = "AE"
                            Case 230        '"æ"
                                LetterCheck = "ae"
                            Case 199        '"Ç"
                                LetterCheck = "C"
                            Case 231        '"ç"
                                LetterCheck = "c"
                            Case 200 To 203 '"È, É, Ê, Ë"
                                LetterCheck = "E"
                            Case 232 To 235 '"lowercase version of the above"
                                LetterCheck = "e"
                            Case 204 To 207 '"Ì, Í, Î, Ï"
                                LetterCheck = "I"
                            Case 236 To 239 '"lowercase version of the above"
                                LetterCheck = "i"
                            Case 208        '"Ð"
                                LetterCheck = "ETH"
                            Case 240        '"ð"
                                LetterCheck = "eth"
                            Case 209        '"Ñ"
                                LetterCheck = "N"
                            Case 241        '"ñ"
                                LetterCheck = "n"
                            Case 210 To 214, 216 '"Ò, Ó, Ô, Õ, Ö, Ø"
                                LetterCheck = "O"
                            Case 242 To 246, 248 '"lowercase version of the above"
                                LetterCheck = "o"
                            Case 217 To 220 '"Ù, Ú, Û, Ü"
                                LetterCheck = "U"
                            Case 249 To 252 '"lowercase version of the above"
                                LetterCheck = "u"
                            Case 221        '"Ý"
                                LetterCheck = "Y"
                            Case 253        '"ý"
                                LetterCheck = "y"
                            Case 222        '"Þ"
                                LetterCheck = "TH"
                            Case 254        '"þ"
                                LetterCheck = "th"
                            Case 223        '"ß"
                                LetterCheck = "ss"
                            Case 255        '"ÿ"
                                LetterCheck = "y"
                            Case Else
                                LetterCheck = LetterCheck
                        End Select
                        
                        'Create hew string from stripped charcters
                        NewStr = NewStr & LetterCheck
                        
                    
                    Next
                    
                    'set return value
                    FindRem = NewStr
                    


End Function
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 19:16
Joined
May 21, 2018
Messages
8,463
There are a lot more international characters above ASCII 255 so I changed my function and put all possibilities into a table. Ideally I would identify what language set the user was concerned about to possibly speed up the check. In this case the user was using Polish I believe and many of those characters were above 255. Examples
tblCharacters tblCharacters

CharacterASCI_WReplace
Ķ
310​
K
ķ
311​
k
ĸ
312​
k
Ĺ
313​
L
ĺ
314​
I
Ļ
315​
L
ļ
316​
I
Ľ
317​
L
ľ
318​
I
Ŀ
319​
L
ŀ
320​
I
Ł
321​
L
ł
322​
I


What I do not understand some of the characters look to be duplicated but are not really.
ASCII 208 also seems to be 272 and 393, but seems to depend on how they are generated because they are not identical. Looking at your list I can see my guesses on some of the replacements were wrong (if there is even a viable replacement). I just pick the character it most closely represented. Some likely have dipthongs to replace them. Example, 208 should be ETH not D. Now I can update my table.

Also I do not know how the replace method actually works, but I am assuming that just doing a Replace on the whole string would be way more efficient than looping each character. In that thread my original version (hard coded choices without a table) used the replace method similar to your cases.

To demonstrate that there may be a lot more potential cases (I am sure some of these are unlikely to ever be used). I ran a test string against yours and my code. Yours is output 1 mine is output 2
Cases.jpg


To reiterate, I am sure some of my replacements are not the correct choices if they can even be logically replaced, but the table could be updated to suit the users needs.
Also my table is no where near complete the ASCII Wide (AscW and ChrW) character set ranges from -32768 To 65535.
 

Attachments

  • CleanCharacters.accdb
    752 KB · Views: 411
Last edited:

Users who are viewing this thread

Top Bottom