Doubts re 'Data Type' for certain fields

Pinesh · Jan 3, 2018

Hi

I am from India and engaged in teaching ICT to students who appear for IGCSE exam - from India.

MS Access (Database Management) is a subject in ICT.

I have a very simple - base level question / doubt. Refer to the attachment -
IGCSE question paper - Question 9B. The question is what is appropriate data type for field 'Examination_ID' and field 'Examination_Level'.

Some data of the fields is given in the paper.
The obvious answer (I assume) would be that the data type would be
'Examination_ID' :- 'Number'
'Examination_Level' :- 'Text'.

But surprisingly the marking scheme / answers as published by IGCSE board is:-
'Examination_ID' :- 'Text'
'Examination_Level' :- 'Boolean'.

The reason to post such a basic / unintelligent question here on this forums is.... IT professors of 20 Schools in our area are unable to explain to students as to why should the data type of
(1) 'Examination_ID' where data is in form of 6 digits should be 'Text' and not 'Number'
(2) 'Examination_Level' where data is in form of alphabets be 'Boolean' and not 'Text'.

Can some members throw some light on this matter please?

Thanking you in advance,
Regards
Pinesh

Minty · Jan 3, 2018

The sample data provided is "interesting" in that it is ambiguous and open to interpretation.
The examination ID appears to be a number but with a leading zero could be interpreted as text, but could also simply be a number formatted to 6 digits.

The Examination level is definitely text as presented in the sample data, but as there are only two values you could if you were presented with the certain knowledge that there are only those two types of exam level use a Boolean data type, but would be very unwise to do so given the possibility of other exam levels being created in the future. So I would say that is incorrect given the sample data and lack of other supporting data.

isladogs · Jan 3, 2018

Both fields must be Text

As Minty says, the Examination ID needs to be Text due to the leading zero.
If you used Number, 041731 would be stored as 41731

Unless it says on the previous page that there are only 2 possible outcomes for Examination Level, the boolean datatype is incorrect.
However, even if there were only 2 outcomes, if it were boolean datatype, the value would be shown as True or False.
So the accepted answer is still incorrect
Its definitely Text

The examiners clearly don't understand databases very well!

Galaxiom · Jan 3, 2018

It is actually an advanced question because the answer is not simple. And like everything there are two sides to the story.

Many years ago a Microsoft blogger said that any value consisting of numeric characters but not operated on arithmetically should be treated as text. It quickly became the standard doctrine but I have never seen the fundamental arguments behind it. I tried discussing it at another site and got banned for suggesting it might be wrong thinking because "that is the way it is".

It made sense with some values. For example phone numbers were formatted differently in different regions, with various digit groupings and parentheses. They often needed a leading zero or a + sign or a dash and using text would permit this.

For some that led to the concept that a Customer Number (for example) was also text since it was never involved in arithmetic. Sometimes users require leading zeros on such designators so don't perfectly suit storage as a number anyway. We do have formatting for numbers but it could be argued that nnnn and 0nnnn are two different customer keys.

There is the grey area of the customer number potentially being incremented from the previous maximum during creation but let's not go there.

Truth is that the choice of type isn't black and white and there are pros and con in every choice. There are many grey areas in all kinds of number-like data and the wise developer considers the particular advantages every time they are confronted by a choice.

Storing numerals as text allows for versatility at the storage cost of one byte per character (two bytes if using unicode) allowing for up to 255 characters. Meanwhile a Long Integer can store up to nine digits in four bytes so can be more efficient to store and process at the expense of a limited range.

Text also allows many more value variations but this is a double edged sword. In the case of telephone "numbers", allowing an operator to enter formatting means you will have to process the string to get the digits if you want to do something automated with it.

The inability to input formatting can also be a strength of storing numeric values as numbers by actually preventing any attempt to enter formatting right at the table level. Because a string allows so much variation, developers often impose an Input Format to keep it consistent so there are development costs of using text to consider too.

In the case of a machine generated numeric ID these considerations are not important. The more compact storage would be the sensible decision wouldn't it? Well not quite, especially when gigabytes and gigahertz now come so cheap.

IDs are often keys and as such are always indexed. The indexing is a considerable overhead and performance is more important than the storage cost. So which data type creates and uses its indexes most efficiently?

Years ago I assumed the numeric would perform better but the truth is we don't really know. When I posted my assumption I was called on it by our own late great ChrisO who suggested that indexes were binary so it might make little difference what they were built from. When I went looking I found some test results that suggested that text keys were outperforming numeric on joins. The bad news was it wasn't an Access system they tested so we still don't really know.

Personally I think the doctrine of using text is wrong. Like every decision a developer makes, the pros and cons of the particular circumstance should be accounted every time.

There is one last consideration from your point of view. When a specification is sent to you, that is the type you choose.

However there are good reasons for a data agglomerator to specify text IDs even though they are expected to be numeric. As long as the value is a key within each contributors data it doesn't really matter what it is, because they will add a contributor key to their records.

Accepting text means any value will be imported to the master database. If a data contributor had text values and the central system expected a number, they would have to deal with it. Simpler just to accept text keys even if they are all numerals.

The boolean choice is the other end of the spectrum. Boolean only has two values. True or False. Either the student passed of failed. It they allowed text they would inevitably be confronted with unexpected letters that were allowed in through mishap. Nothing worse than encountering an unexpected character instead of "P" or "F" for Pass and Fail.

Certainly feel free to add your perspectives too. I am sure we will hear more from others on this thread.

sonic8 · Jan 3, 2018

Pinesh said:
Can some members throw some light on this matter please?

There is only one possible explanation. The person(s) who created the test has/have not much of a clue about databases in general and Access in particular.

9(b): Examination_Level cannot be Boolean unless you use custom formatting.
9(c): Both of the supposedly correct answers will result in a syntax error. - You need delimiters to query date values.
9(d): Date_of_Exam, second supposedly correct answer is wrong. Checking the format of a date for validation against region depending conventions is a bad practice. Furthermore, this check will not have any benefit for Date date type.

Pat Hartman · Jan 3, 2018

Examination_ID - Given the values show a leading zero, that tells us that the value should be defined as text. Many people make the mistake of defining fields such as Zip code which in the US is 5 numeric digits as numbers, However, the New England states have zip codes that start with a leading zero. Numeric fields do not retain leading zeros because that is how numbers work. "numeric" strings that start with zeros are codes and codes are text. Therefore, this field must be defined as text to retain the leading zeros. The name is intended to mislead us. If you use good naming standards, you would reserve the "_ID" suffix for autonumbers or identity columns or at a minimum something to be defined as numeric. This field would more properly be named Examination_CD.

Examination_Level - the test writer's definition of this as Boolean is just plain stupid. Just because there are only two visible values doesn't mean there would always be only two values and that would be a false assumption on the part of the test taker. Plus, defining the field as Boolean would require coding to translate it which is also poor practice. All in all, I would ask that this question be replaced or the answer modified. Boolean's are used for yes/no or true/false situations such as an employee is active or inactive. Whenever there is a possibility of a third choice or an unknown value, you have to go with an integer or a text string, both of which would be defined in a table for efficiency and ease of maintenance. To do anything else would be irresponsible and short-sighted.

Pinesh · Jan 6, 2018

Agree to your answer and views.
Thank you so very much

Pinesh · Jan 6, 2018

Dear Minty
Dear Ridders
Dear Galaxiom
Dear Sonic 8
Dear Pat Hartman

Thank you very much for your time and efforts to throw some light on the doubts. Have been able to - thus with your help - resolve the doubts of the students.

Regards / Pinesh

gemma-the-husky · Jan 8, 2018

@galaxiom

I think the truth is probably that there is just no point in using a numeric field to store a value that you do not need to undertake computations on, but you don't have to be axiomatic about it.

It may be OK to use a number format, but using a number format may introduce some issues, such as managing leading zeroes.

(as an aside, it's still apparent that there is an issue with leading zeroes, as we often see queries dealing with field values where the length - ie leading zeroes - is significant)

(as a second aside we may also need to manage significant digits after a decimal point - where they are clearly not significant in numeric terms, but might be in terms of the client system)

eg. take an ISBN (where N is actually an aabbreviation for "number"). It definitely isn't a number. It's a string, with multiple separators. Maybe you could store it in multiple numeric fields, but then you have the issue of managing both leading zeroes, and ensuring a combined total of 13 characters.

And a PIN (often recursively termed a PIN Number) where in some cases the characters in the NUMBER may actually not be numeric, and where leading zeroes occur regularly.

isladogs · Jan 8, 2018

take an ISBN (where N is "number"). It definitely isn't a number. It's a string, with multiple separators. Maybe you could store it in multiple numeric fields, but then you have the issue of managing both leading zeroes, and ensuring a combined total of 13 characters.

All true but it makes life extremely difficult when trying to help Access newbies who name fields like e.g. ID Number (usually with a space) and set datatype to text without thinking to mention it in their posts

I've dealt three such cases today alone ....

gemma-the-husky · Jan 8, 2018

@ridders

ah yes [ID Number]

isladogs · Jan 8, 2018

The best one today was actually called Text_No
It was of course a memo field .....

Galaxiom · Jan 9, 2018

gemma-the-husky said:
I think the truth is probably that there is just no point in using a numeric field to store a value that you do not need to undertake computations on, but you don't have to be axiomatic about it.

It may be OK to use a number format, but using a number format may introduce some issues, such as managing leading zeroes.

I mentioned two points above. Firstly, efficiency of storage. Secondly, it absolutely prevents the entry of unwanted formatting characters.

Phone numbers represent the extreme case of the borderline between using numbers or text. They are a classic case of the formatting issue. Without some management, operators are prone to entering a vast range of different formats and even include things like "(mother's phone)" or "nights only" on the end.

Another issue is the Area Code and National Code sections. The business I work is close to the state boundary so many numbers come from one of two different area codes. Some entries omit the Area Code, others format it with brackets, others add dashes. Others add extension numbers including different formats. International customers may have the international access code included (0011 in Australia) either with or without the + that proceeds it when written (but not when dialled). It is a mess.

Complex input templates and validations are potential solutions. However allowing for optional international numbers and extension numbers makes it very complex.

An alternative is break down the number into separate fields. NationalCode, AreaCode, PhoneNumber and ExtensionNumber. A PhoneComment field for things like "nights only" can also be added.

Notionally most people refer to the AreaCode with a leading zero. However the zero is actually the trunk access code and does not technically form part of the area code, which becomes obvious when dialling internationally. Similarly the 0011 is an access code, not part of the number.

Considered this way. there are no leading zeros on any part of the phone number, at least in Australia. I have not searched extensively but I have not been able to find any phone number anywhere in the world that does not fit this pattern.

I haven't seen a phone number section exceeding eight digits either. Long integer would do the job.

Any desired format for display of a whole number can be easily applied. The strict format for Windows to dial a modem can be easily built without even thinking about entry formats needing to be dealt with.

Of course troubles would begin if this situation ever changes and nine digit numbers were implemented. That is where text is most versatile.

However I am not advocating either text or numeric here. The point I am really making is that all choices have pros and cons and the developer should consider every case. The "no arithmetic" rule is too simplistic.

Lightwave · Jan 9, 2018

There is one further practical consideration for making ID field integer

Autonumber primary keys have to be integers. Plus if you want to use referential integrity I think access requires integers.

Unless forced if you want a long life just use integers for keys

The students and the professors are right - one yes why? and two there is no good explanation.

I rarely rarely replace keys in systems but this would be a system a I would consider regenerating the keys

Its actually an excellent topic which could be discussed around Extraction Transformation and Load between systems - the importance of keys when transferring information between systems. There is a lot of work that you can do to improve systems with historical records by normalizing data post existence often requires a bit of thought but it can significantly increase the value and flexibility of data. I have for instance experimented in generating relationships between spatial data types based on centroids of areas between data formely unrelated.

Here's a discussion using and open source desktop application in theory this could be done using SQL

Pat Hartman · Jan 9, 2018

Plus if you want to use referential integrity I think access requires integers.

As with joins, relationships can be made using any data type. The only requirement in both cases is that the data type in each table must be the same.

Doubts re 'Data Type' for certain fields

Pinesh

New member

Attachments

Minty

AWF VIP

isladogs

MVP / VIP

Galaxiom

Super Moderator

sonic8

AWF VIP

Pat Hartman

Super Moderator

Pinesh

New member

Pinesh

New member

gemma-the-husky

Super Moderator

isladogs

MVP / VIP

gemma-the-husky

Super Moderator

isladogs

MVP / VIP

Galaxiom

Super Moderator

Lightwave

Ad astra

Pat Hartman

Super Moderator

Similar threads

Users who are viewing this thread