Unicode string utility functions

xavier.batlle

Active member
Local time
Today, 16:29
Joined
Sep 1, 2023
Messages
231
VBA string functions: Len(),Right(), Instr(), etc. don’t work as expected with Unicode strings that contains code points above 65535, so I developed some string functions to manage this kind of strings.

I know that Unicode characters above 65535 are rarely or never used, but I think it’s important to know how to deal with them, just in case you need it.
Note:
Some Operating Systems or some fonts don't show some characters as expected!

1724597811416.png
 

Attachments

Last edited:
I've never used Unicode. What are composite characters?
Perhaps I should be used the nomenclature "decomposed form of a character" instead of "composite character"

Some references (I can't post links) :
1724614134788.png




1724614362075.png
 
Last edited:
I will add that if you run into files that are UTF-n encoded (n=2, 8, 16, and maybe others I haven't run across), you will sometimes run into the type of characters being described. You CAN ask Notepad to save UTF-n files as ANSI text but it doesn't always do very well at it.

I first ran into UTF-8 with my genealogy database when Ancestry.COM changed their genealogy downloads (GEDCOM format) from ANSI to UTF-8 encoding. Didn't have Xavier's routines at the time - a few years ago - so I had to roll my own way of handling it. Solved it by turning ALL extended characters into something I would treat as a stand-alone non-printing control character and then my semantics parser could handle it.
 
You CAN ask Notepad to save UTF-n files as ANSI text but it doesn't always do very well at it.
For context: A Unicode encoded file may contain non-ASCII characters from multiple different ANSI codepages. It is then simply impossible to save this file using just one ANSI codepage.
Even if all non-ASCII characters of a file can be correctly represented in a single ANSI codepage, it is beyond the capabilities of simple algorithms to determine the correct one with absolute certainty.
 

Users who are viewing this thread

Back
Top Bottom