Better to store text or numeric

Sess · Feb 10, 2010

I am building a database with some parameters based on observed performance and I am wondering which is more efficient way to store the parameters.
Allowing a user to select
Poor - Fair - Good - VeryGood - Excellent
would it be good practice to store the information numerically as
1 2 3 4 5 and convert it back on the fly to the text values
or just store as text
Poor Fair Good VeryGood Excellent

Thank you.

rainman89 · Feb 10, 2010

For it to be normalized, you would want to have a seperate table with the

poor
good
fair
etc..

in it and then just have the ID field in the other table

Access_guy49 · Feb 10, 2010

Is there a reason why you would have to use the id? would a pre set list of values not accomlpish the same thing? I guess my question is, why does it have to use the id? from a user aspect, a coded drop down would still ensure that no other values are put in the table, would it not?
(Just wondering)

rainman89 · Feb 10, 2010

Access_guy49 said:
Is there a reason why you would have to use the id? would a pre set list of values not accomlpish the same thing? I guess my question is, why does it have to use the id? from a user aspect, a coded drop down would still ensure that no other values are put in the table, would it not?
(Just wondering)

You COULD do this, sure. IF I were trying to design this, and wanted it to be properly normalized, this is how I would do it though.

Nothing like looking at a table and seeing "poor" written 50000 times

Brianwarnock · Feb 10, 2010

rainman89 said:
You COULD do this, sure. IF I were trying to design this, and wanted it to be properly normalized, this is how I would do it though.

Nothing like looking at a table and seeing "poor" written 50000 times

Yes far better to look at the ID 1 written 50000 times.
But why are you looking at the Table surely you should be viewing a Form or Report.

Brian

rainman89 · Feb 10, 2010

Brianwarnock said:
Yes far better to look at the ID 1 written 50000 times.
But why are you looking at the Table surely you should be viewing a Form or Report.

Brian

Correct, I was just stating that I think that for it to be properly normalized, you would want a seperate table for those choices...

Am I wrong?

KenHigg · Feb 10, 2010

I would just store it as text. Seems the suggestions so far are over-complicating the issue.

Brianwarnock · Feb 10, 2010

I was being sarcastic as I can see no reason not to just store the Text, which rule of normalisation are we abusing.

Brian

rainman89 · Feb 10, 2010

Brianwarnock said:
I was being sarcastic as I can see no reason not to just store the Text, which rule of normalisation are we abusing.

Brian

I guess I don't know. That is how I would do it is all.

I stand corrected!

Banana · Feb 10, 2010

For me the most compelling argument is this:

A single character, depending on which character set we may use could require between one to three byte to store this character. But for simplicity, let's assume every character used only one byte per.

Therefore, a text string of "Poor" would require 4 bytes to store. But if we were to use a ID in its place, we could choose to use a byte and thus store a single byte for every time we need to mark a record as "Poor" and define "Poor" only once in the database.

So instead of having 4 bytes * N number of records = M total bytes consumed, we could do this, 1 byte * N number of records + 4 bytes for "Poor" = M total bytes consumed.

But it's not all about storage space. Which will be faster to compare, a string that could be longer than the processor's total number of registers or an integer that will fit the number of registers or be less? Therefore, by using integers, the processor can grind through much more records than it had to compare the string. Now, even if we had 4 byte string which would be equivalent to Long which is also 4 bytes, Long is still faster than string. Why? Because integers doesn't have a collation and the processor must take an extra step to check the collation... Should "Smith" = "smith"? "Mueller" = "Müller"? To avoid the extra step, you would have to explicitly compare them as binary. While you could specify so via StrComp function, I'm not aware of a character set in Access that enables you to compare by binary by default. Those questions are totally irrelevant for integers.

So, in terms of storage space and processing expenses, surrogate key composing of integers will almost always win out compared to text.

rainman89 · Feb 10, 2010

Banana said:
For me the most compelling argument is this:

A single character, depending on which character set we may use could require between one to three byte to store this character. But for simplicity, let's assume every character used only one byte per.

Therefore, a text string of "Poor" would require 4 bytes to store. But if we were to use a ID in its place, we could choose to use a byte and thus store a single byte for every time we need to mark a record as "Poor" and define "Poor" only once in the database.

So instead of having 4 bytes * N number of records = M total bytes consumed, we could do this, 1 byte * N number of records + 4 bytes for "Poor" = M total bytes consumed.

But it's not all about storage space. Which will be faster to compare, a string that could be longer than the processor's total number of registers or an integer that will fit the number of registers or be less? Therefore, by using integers, the processor can grind through much more records than it had to compare the string. Now, even if we had 4 byte string which would be equivalent to Long which is also 4 bytes, Long is still faster than string. Why? Because integers doesn't have a collation and the processor must take an extra step to check the collation... Should "Smith" = "smith"? "Mueller" = "Müller"? To avoid the extra step, you would have to explicitly compare them as binary. While you could specify so via StrComp function, I'm not aware of a character set in Access that enables you to compare by binary by default. Those questions are totally irrelevant for integers.

So, in terms of storage space and processing expenses, surrogate key composing of integers will almost always win out compared to text.

Well put, didn't think of it that way.

Brianwarnock · Feb 10, 2010

That is all very fine but doesn't it lead to a number of questions to be resolved first, after all a conversion has to take place every time we wish to display the value.
So

How big is the database,
how many times stored the values
How often comparisons (and does this include such things as Counts?)
How often just displaying.

That will do for now

Brian

Pauldohert · Feb 10, 2010

You could store the text in a separate table with an Id, then store the ID each time the drop down is selected also you could hold a rank numeric against each text - ie

Rank 1 = Poor = ID 1
Rank 1.5 = Fair = ID 2

etc etc

That way you can easier query those Fair or better for example - and can easier change your list sometime in the future - and still allow ranking.

Access_guy49 · Feb 10, 2010

Awesome, so, i guess my conclusion is that, by no means do you have to store the id number and draw from a look-up table. And when dealing with a table of a few hundered records, there probably isn't much point in adding the extra complexity. But on larger datasets, this method does provide some performance advantages.
Excellent. (I know this isn't my thread but i did learn something today! so thanks to everyone for that!)

KenHigg · Feb 10, 2010

This is a pretty cerebral dialog but it seems a bit overkill for this situation. Just store it (in the same table) as text...

Banana · Feb 10, 2010

Brianwarnock said:
How big is the database,
how many times stored the values

I've already given the equations. If we used text "Poor" instead of surrogate ID, it will be less only if the "Poor" is stored exactly once. Store it twice, and it's already in deficit of two bytes compared to surrogate ID (2 bytes + 4 bytes). The deficit will only widen as the number raise.

How often comparisons (and does this include such things as Counts?)

Yes. I'm using the word, "comparisons" in its broadest sense but yes, the computer has to compare the value to increment the count, find a record matching that value, ordering the row, and then some more.

How often just displaying.

Now the question about doing conversions is certainly a legitimate question. Here's my belief (e.g. I don't have supporting evidence for this): When we open a lookup table, Jet/ACE will cache the table in memory so we only have the "hit" of conversion at first time but thereafter, it'll be quite fast to replace the ID with the actual text for display. Furthermore, it will involve less I/O because it only has to read "1" off the disk rather than "Poor" off the disk for every records, and for most part, I/O is the traditional bottleneck, so by enabling the engine to read the "1" and replace it with "Poor" from memory, it can process faster than if it had to read off "Poor".

To be fair, though, we could very well use a text as key and not even notice any slowdown even with say, 10000 records due to the computers being very fast. It's not going to be human-discernible until we get into very high number of records. But I still would rather that my application was optimized if only to reduce the contention over the records as the concurrency increases and especially when the database migrates to big iron but still using Access as front-end.

As someone said, "If you don't have time to do it right first time, when will you have time to do it all over?"

Brianwarnock · Feb 10, 2010

I appreciate where you are going, our mainframe Dbs spread over many discs, although small compared to our Teradata system, however I have written a number of access DBs with this sort of requirement and the biggy may have reached 500 records, but one thing all of theses databases had in common, we knew at the outset the probable scale of things and thus the important criteria. To not do so is likely to waste resources, human and time, doing things the wrong way.

Brian

gemma-the-husky · Feb 10, 2010

I would not store a lookup as text, generally - I would use a lookup table with numeric keys

Reasons

1. Efficiency - numbers are easier to store and manipulate
2. Maintenance - if you want to change the legend for a description, if you are storing a numeric FK - its trivial - BUT if you are storing text, then you have to use cascading updates - which personally I woudl try to avoid.

------------

I might make an exception for something like a "state" table - and use the two-letter state code - but I would still use this code, and not the State Name itself.

As another example, if you had an scientific element table, would you select to store as the PK
a) the element number or
b) the element symbol (or even the name)

------------

and I can see why Ken is saying this is pretty cerebral - except that it isnt, I dont think

Loads of threads here discuss the use and benefit of numeric (autonumber) indices - and its a real good habit to get ingrained -use numeric PK/FK values - and dont expect an autonumber to be meaningful - the fact is that it doesnt matter if we dont see "sensible/meaningful" data in a table - because we shouldnt need to be looking at the table

for the same reason we suggest not using lookups at the table level

SOS · Feb 10, 2010

And an FYI - Cascading updates only work for PK's, not for normal fields. So, if you don't have a table with those listed as PK's then a cascading update would not work anyway and you'd need to use update queries to update wherever the values were that needed updating.

I use the ID's myself and it works fine in all of my databases, even on those with large sets of data.

Brianwarnock · Feb 10, 2010

Without knowing anything about the environment of the poster it is difficult to be absolute I'm sure we all agree about what to do for big DBs but the clincher for small ones is this from Dave

2. Maintenance - if you want to change the legend for a description, if you are storing a numeric FK - its trivial - BUT if you are storing text, then you have to use cascading updates - which personally I woudl try to avoid.

I've not had to contend with this but I guess its not uncommon for management to change their mind, and that would be a pain even on a small DB.

Brian

Better to store text or numeric

Registered User.

I cant find the any key..

Registered User.

I cant find the any key..

Retired

I cant find the any key..

Registered User

Retired

I cant find the any key..

split with a cherry atop.

I cant find the any key..

Retired

Something in here

Registered User.

Registered User

split with a cherry atop.

Retired

Super Moderator

Registered Lunatic

Retired

Similar threads

Users who are viewing this thread