Incorrect Binary-based ASCII sorting in tables (1 Viewer)

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
All,

I just noticed that Access tables do not sort special (characters) in the correct order, in binary terms. For instance, I have a table with one field as such:

Table1
AC
-C

sorting in Ascending order, 'AC' comes first then '-C'. However, 'AC' is '4143' in hex, whereas '-C' is '2D43' in hex. Obviously, when you look at that in hex, it looks backwards. Hence, the table is not sorting the values properly.

Is it sorting the values properly, but special characters are treated differently? Is it really sorting in binary? Or is the sorting just ASCII-based?

Not too keen on this subject, can someone give me a boost here for understanding this issue?

Reason - sorted data of mine goes into a system that reads binary files (made by data I produce in Access). However, if someone requests data from a specific location with a special character like "-" in it, the location is incorrect because of the sort order of the records that the binary-based mechanism is expecting. However, from an Access-based perspective, the sort order IS correct.

However, the customer uses the binary-based machine.

thanks guys!

<EDIT>
apparently, 1-byte characters (single characters) do not cause an issue. It happens with character strings that are more than 1 char long. for instance, this is the ASC sort order for one-char strings:

Table1
-
A

and that is correctly sorted, in hex. ??
 

vbaInet

AWF VIP
Local time
Today, 17:47
Joined
Jan 22, 2010
Messages
26,374
So give us a good example of how it's currenly being sorted and how you would like it to be.

By good example I mean a representative data set.
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
So give us a good example of how it's currenly being sorted and how you would like it to be.

By good example I mean a representative data set.

OK then. Take a look at the attached. 3 queries:

1) One_Character_Hex_Sort: The dash is first, when dataset is sorted in ascending order.

2) Two_Character_Hex_Sort_ex1: The string with the dash in char placement #1 is still first, when dataset is sorted in ascending order.

3) Two_Character_Hex_Sort_ex2: Oops! The dash takes a back seat and the sort becomes incorrect when the dash is paired with any character other than "A".

Is this right? What gives? Per my previous post, we had someone the other day at the company try to enter data into a binary-based machine and their data had a dash in it. Thus, the top-level coding looked at the wrong location to get the database data that they needed. I'm assuming it's because of this crap? Or am I mistaken here? This doesn't look consistent, nor does it look correct. Do I simply not know technology as well I think I know it?

thanks. :)
 

Attachments

  • Hex Sorting.zip
    38.1 KB · Views: 192

vbaInet

AWF VIP
Local time
Today, 17:47
Joined
Jan 22, 2010
Messages
26,374
As far as I know Access performs an ASCII sort, not a binary sort hence the result you get. If you get the ASCII equivalents of those strings and sort on the ASCII fields you will notice that it is the same as sorting on the text field.

So what would you like it to be?
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
If you get the ASCII equivalents of those strings and sort on the ASCII fields you will notice that it is the same as sorting on the text field.

what do you mean by this? explain it to me, if you would please.

if we take this ASCII chart: http://www.ddms.com/resources/help/reportsmenu/ascii_sort_order_chart.htm

and apply it to my 3 queries in that file, only ONE out of the 3 follow the damn rules! I don't necessarily want ANY result. More or less, I want an explanation as to why this crap is going on. We can always tell customers to forgo data entry that contains dashes, but the business value here is in the understanding of why MS applications do this crap. Everyone knows that MS is the worst technology company out there, but other than that obvious reason, what other explanation can I put behind this finding so my boss can justify his words to customers?
 
Last edited:

vbaInet

AWF VIP
Local time
Today, 17:47
Joined
Jan 22, 2010
Messages
26,374
Sorry Adam! I meant it sorts it in Ascending order as default, not ASCII. I was about to sleep when I wrote that message ;) :p

So, if we take these values for example (and enter them in this order):
A
-
a
-a

we get:
-
A
a
-a

However, if we enter it in this order:
a
-
A
-a

we get:
-
a
A
-a

Notice, what happens with a and A? It's just doing an alphabetic (non-case sensitive) sort.

Does that help?
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 17:47
Joined
Sep 12, 2006
Messages
15,613
CURIOUS - Yes - this is strange, isn't it

I just tried that, and got the same result - so there is something strange then isn't there.

a text (lexical) sort, should sort ascending as

-
-a
a
A

(assuming a dash has a lower ascii value than a letter - and that the sort order of a and A is immaterial)

The fact that a and A are not sorted in ascii order clearly indicates that access IS actually doing something unusual with ascii values.


----
out of interest, another thing I have noticed in the past is (I think) that objects with underscores in them (maybe dashes as well) in them, sort differently in the database window, compared with the order for the same objects in the wizard pick lists. (eg, when selecting a query on which you base a form - IIRC)
 

vbaInet

AWF VIP
Local time
Today, 17:47
Joined
Jan 22, 2010
Messages
26,374
Access isn't sorting using Ascii, it sorts in alphabetical order sometimes called a Natural sort. That is, Special Characters followed by Numbers followed by Alphanumeric chars - something like that. Obviously numbers aren't seen as numbers in this context but I just mentioned that for explanation purposes.

Perhaps a clearer example would be:
-
_
a
-a
a-

So natural order, not Ascii folks ;)
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
Notice, what happens with a and A? It's just doing an alphabetic (non-case sensitive) sort.

Does that help?

iNet,

I realize that, but it is NOT that simple. At least I don't think it is. First of all, yes you're right, the sort function in Access completely disregards case-sensitivity. That is not what I'm concerned about. I'm concerned about the fact that the sort ordering that results when you mix special characters (-, !, #) with text characters, as part of multi-byte strings (more than 1 char in length) has absolutely no consistency to it.

Does that make sense? That's a serious problem for me, as my Access data output is fed into a system that has a different interpretation of this stuff. And the interpretation is most likely verbatim to the ASCII value sort that my chart link shows. So obviously that's a problem!

Do you see what I'm getting at here? Let's talk in HEX...why in the hell would a string like "-C" be interpreted by Access as a higher value than "aa"!?!? (see my Two_Character_Hex_Sort_ex2 query in the db).

Now if I take my own research, what it looks like is that there really is NO consistency as to what effect the dash (-) has on sort order. Consider these values and their hex equivalents, each sorted in ASC order by Access:

Field1 (this is INCORRECT. The "-ad" is the smallest value, hence it should be first!)
aaa / 414141
AAA / 414141
-ad / 2D4144



Field1 (CORRECT order when we take out "d" from the previous set's )
-a / 2D41
AA / 4141
aa / 4141



Field1 (now add "4d" to the end of the suspicious value and the order is correct again!
-a4d / 2D413444
aaaa / 414141
AAAA / 414141



So the moral of the story here is that "d" in the second position only causes an error in the sort order. Notice that if I substitute the "4" in the second position and follow it with the "d", Access sorts correctly.

So you see what I need? I need an explanation of this crap to give to my boss (other than the fact that Access is very poor software). They are migrating to C# very soon, but until that time I need a very low-level explanation to as to why he should expedite the platform migration. Obviously we cannnot have this s$it going on. It's completely inexcusable for a software manufacturer with the longevity of Microsoft's to make mistakes like this. Unreal...

thanks! :)
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
Yes, you are right. If we look at things in HEX or ASCII terms -ad should be first but Access uses neither strategies.

I think the following link should give you a better understanding:

http://support.microsoft.com/kb/100366

thanks! :)

That article says a lot, but I still think Access has a flaw here. That article does not explain the "second position" bug I pointed out and tested. I can see how character weighting or weight distribution would play a part in that, but it is still incorrect.

But do you think this problem is related to the sorting routine that WINDOWS calls, or a different sorting routine that office applications use on their own? That I cannot answer because I wouldn't know. What's your take?
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 17:47
Joined
Sep 12, 2006
Messages
15,613
would part of this "discrepancy" involve the fact that a "field" is not just a string, since it can also be null, and therefore is a more complex object than a simple string.
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
would part of this "discrepancy" involve the fact that a "field" is not just a string, since it can also be null, and therefore is a more complex object than a simple string.
I have no idea, Gemma.

But I do know that a sort-based functionality should work in an ANY application, and it should follow the standard set forth in the country the manufacturer is in. Is that too much to ask? :rolleyes:

Like any software out there, finding the actual root cause of such a low-level issue like this one is virtually impossible. But finding the answer would be nice for one reason -- It would REALLY help forcast what BS Microsoft is going to release next. It would give some good insight into how mis-informed their developers are when it comes to detail. That way, you don't count on their software to do certain things that it should be expected to do anyway, without even having to think about it.

Agree? :)
 

vbaInet

AWF VIP
Local time
Today, 17:47
Joined
Jan 22, 2010
Messages
26,374
thanks! :)

That article says a lot, but I still think Access has a flaw here. That article does not explain the "second position" bug I pointed out and tested. I can see how character weighting or weight distribution would play a part in that, but it is still incorrect.

But do you think this problem is related to the sorting routine that WINDOWS calls, or a different sorting routine that office applications use on their own? That I cannot answer because I wouldn't know. What's your take?
But how can we condemn (or disprove) a strategy when we don't understand the internal workings of what takes precedence? An excerpt from one the Access help files reads, "One thing to remember when applying sort orders is that numbers, text, and special characters are sorted according to the selected language and regional settings of your computer."

Another one reads, "If a value in a field starts with a special character — such as a hyphen, parenthesis, or other symbol — you will observe the following behavior when sorting in ascending order:

  • Values that start with a space character will appear before alphanumeric values.
  • Values enclosed in quotation marks follow values that start with a space, but precede alphanumeric values.
  • Values that start with the minus sign (-) precede values that start with the plus (+) sign.
  • For all other symbols, the order is determined by looking at the ASCII character codes of the characters. For example, the code for the dollar symbol ($) is 36, and the code for the equal sign (=) is 61, so values starting with $ will appear before values that start with =."
I guess one would have delve deep into the sorting algorithm to understand what exactly is going on. You could ask Microsoft nicely for the algorithm :)

In Normal sort, aaa or AAA or aaA or any variation of those three a's is less than -ad. The reason is because of the d. When you put a 4 it changes the order because 4 comes before a. Remember this isn't an ASCII sort, it's a General sort.

If you want an ASCII sort create one.
 

the_net_2.0

Banned
Local time
Today, 12:47
Joined
Sep 6, 2010
Messages
812
Another one reads, "If a value in a field starts with a special character — such as a hyphen, parenthesis, or other symbol — you will observe the following behavior when sorting in ascending order:

  • Values that start with a space character will appear before alphanumeric values.
  • Values enclosed in quotation marks follow values that start with a space, but precede alphanumeric values.
  • Values that start with the minus sign (-) precede values that start with the plus (+) sign.
  • For all other symbols, the order is determined by looking at the ASCII character codes of the characters. For example, the code for the dollar symbol ($) is 36, and the code for the equal sign (=) is 61, so values starting with $ will appear before values that start with =."

Interesting that none of those bulletpoints address strings of more than one byte length, eh? :rolleyes: Furthermore, they're all BS if you look at the scenario we're talking about in this thread. ;)

If you want an ASCII sort create one.

I told the guy that was researching this stuff, that was the only option for him. Either that, or tell the customer to stay away from that kind of data.
 

Users who are viewing this thread

Top Bottom