Special Characters, specifically Question Marks, in data, string, etc. (1 Viewer)

mjdemaris

Working on it...
Local time
Today, 15:04
Joined
Jul 9, 2015
Messages
426
I am attempting to clean up the data, prepping for an import and removing special characters and accidental user keystrokes...
I found this one record, where the user pasted text from a website three times. I got an error about the field being too small, as I was attempting to convert all double quotes to 'in.' Meaning inches.

I discovered how to remove the carriage returns and line feeds, but that left me with something interesting. In the Access text box, the characters look like double question marks each in a little box. When I iterate through the string and print out the ASCII number, I get the number for a question mark. However, using Instr function, it does not recognize them as question marks. Here is what it looks like copied and pasted from the text box: 

Like wing dings or something. below is a snippet.

So, I have a two part dilemma: How do I find and remove strange looking characters?

And: What is a good way to determine and remove a repeated pasting of text into a field/textbox?
I'm experimenting with taking the first 10 or 15 characters and looking for another instance, then removing everything from that point on. Haven't finalized the workflow for that yet.

Thanks for the help!

1636761914695.png
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 18:04
Joined
May 21, 2018
Messages
8,527
Try this one. This keeps only things in the printable range. If anything it may be too restrictive, so see if some things are lost.
Code:
Public Function ReplaceUnicodeCharacters(txt As String) As String
  Dim i As Integer
  Dim out As String
  For i = 1 To Len(txt)
    If AscW(Mid(txt, i, 1)) < 256 And AscW(Mid(txt, i, 1)) > 31 Then
      out = out & Mid(txt, i, 1)
     End If
  Next i
  ReplaceUnicodeCharacters = out
End Function
 

mjdemaris

Working on it...
Local time
Today, 15:04
Joined
Jul 9, 2015
Messages
426
Thanks Maj. Took out those funky ones.

Edit: How about those pesky copy right symbols?
 
Last edited:

mjdemaris

Working on it...
Local time
Today, 15:04
Joined
Jul 9, 2015
Messages
426
So the ASCII number for the copyright returns 174. A Google search says it's 169. According to this site: ASCII it is neither.
And...going through some Unicode charts is VERY hard on the eyes! Over 150,000 characters!

Edit: Perhaps there may be some legality involved with removing the copyright symbol (after doing some reading). So, I may just leave it in the tables... for my use I don't see any legal wrong with it. It's all internal day-to-day stuff, nothing from the DB goes into the outside world.
 
Last edited:

Steve R.

Retired
Local time
Today, 18:04
Joined
Jul 5, 2006
Messages
4,684
Try the ASCII table below. In the example below, the copyright symbol: 184. Seems that there are a lot of different values for the copyright symbol. Depends on where you look.
 
Last edited:

MajP

You've got your good things, and you've got mine.
Local time
Today, 18:04
Joined
May 21, 2018
Messages
8,527
To figure out what the value is just run the string through a function and see the number
Code:
Public Function ShowCharacters(txt As String) As String
  On Error Resume Next
  Dim i As Integer
  Dim out As String
  Dim ltr As String
  For i = 1 To Len(txt)
    ltr = Mid(txt, i, 1)
    out = out & vbCrLf & ltr & ": " & Asc(ltr)
  Next i
  ShowCharacters = out
End Function

Output for a random string
Code:
1: 49
2: 50
3: 51
4: 52
5: 53
6: 54
7: 55
8: 56
9: 57
0: 48
-: 45
=: 61
a: 97
s: 115
d: 100
f: 102
g: 103
j: 106
k: 107
l: 108
;: 59
 : 32
z: 122
x: 120
c: 99
v: 118
b: 98
n: 110
m: 109
,: 44
.: 46
/: 47
/: 47
 : 32

so in the original function if there is a character to add to the exclusion then just modify the if then. Example
Code:
Public Function ReplaceCharacters(txt As String) As String
  Dim i As Integer
  Dim out As String
  dim ltr as string
  For i = 1 To Len(txt)
    ltr = Mid(txt, i, 1)
    If AscW(ltr) > 31 And AscW(ltr) < 256 and ltr <> 251 Then
      out = out & ltr
     End If
  Next i
  ReplaceCharacters = out
End Function
 

mjdemaris

Working on it...
Local time
Today, 15:04
Joined
Jul 9, 2015
Messages
426
Thanks guys. Now to implement it for chosen table and field. One more question: What is the functional difference between Asc() and AscW()?
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 18:04
Joined
May 21, 2018
Messages
8,527
AscW handles a much greater range of characters. It handles unicode i think to like 65k.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 18:04
Joined
May 21, 2018
Messages
8,527
Here is a good discussion. 256 vs 65k.

Unicode is a standard that is designed to replace the ANSI standard for encoding
characters in a numeric form. Because the ANSI standard only uses a single byte
to represent each character, it is limited to a maximum of 256 different
characters. While this is sufficient for the needs of an English speaking
audience, it falls short when the worldwide software market is considered. With
the Unicode standard, each character is represented by two bytes, so that the
entire Unicode character set includes 65,536 possible locations
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 17:04
Joined
Feb 28, 2001
Messages
27,175
Since you say you are preparing for an import, I have to ask if you know the file format of that imported data. Is there a chance it is something like UTF-8? If it is not a UTF-x variant, then I will step away as my potential comments would ONLY apply to UTF-x formats.
 

mjdemaris

Working on it...
Local time
Today, 15:04
Joined
Jul 9, 2015
Messages
426
To this point, I was only interested in removing special characters from a copy and paste from the Web situation, when users enter data. (Problems arise when users search for some Item and cannot find it because of some special character). We are not concerned, much, I think, with foreign characters...unless the user happens to copy from a foreign website.

At some point in the future, I may be importing/exporting to Excel, but I have yet to determine if this will solve any problems or add any worthwhile benefits. The main point of this DB sys is to manage parts inventory, and allow users to submit a request for purchasing tools, supplies, etc. (termed Items for the purpose of the DB.)
 

strive4peace

AWF VIP
Local time
Today, 17:04
Joined
Apr 3, 2020
Messages
1,004
off the topic of the question ... but related to the discussion ...

I used ChrW to make a list for myself. The weird thing though, is that it starts with Decimal code -32768 and, for me, decimal code of last character is 65535 (the last actual that's not a box, which means it can't show, is 65533) . Not all of the codes between show something -- could be they're in blocks that aren't assigned yet, or I just can't see them.

Here's a fun little Access application I wrote to lookup and show Unicode characters.

free Unicode CharMap tool
https://msaccessgurus.com/tool/UnicodeCharMap.htm

UTF-8 uses however many 8-bit code units it needs ... Access uses UTF-16, which is one or two 16-bit code units. That means common characters take two 8-bit bytes (for total of 16) to store with more complex characters taking four 8-bit bytes -- at least that's my understanding. I could be wrong though!
 

Users who are viewing this thread

Top Bottom