Best Practice: String or Number For Auto ID

catbeasy

Registered User.
Local time
Today, 05:28
Joined
Feb 11, 2009
Messages
140
I have seen a number of databases that use strings for their primary key code instead of a number.

Does it really matter? Are there bad consequences for using a string instead of a number for your primary key field when normalising data?

Is it simply an issue of memory in that strings use more of it?

Thanks..
 
Number... and a meaningless number.

Goes to speed/indexing and somesuch technical stuff.... Good practice ... Auto NUMBER, meaningless primary key
 
Number... and a meaningless number.

Goes to speed/indexing and somesuch technical stuff.... Good practice ... Auto NUMBER, meaningless primary key

Thanks, the indexing issue is something I haven't thought of. Probably because there will be so little data in my database that indexing won't be a huge issue. But its good to keep in mind for the future and a good practice I suppose even in a small database in case the amount of data ever increases to a point where an index would start to make a difference..
 
i prefer numeric strings - it is more efficient for access (ie computers) to sort numbers than to sort text strings, but for small datasets it wont matter too much

an other argument is whether a primary key should be meaningful or not. An autonumber is not meaningful, and therefore if you want to add some meaningful identifier (possibly/probably text), and index it, there is an obvious overhead of managing an additional indexed field.

I still prefer to do it this way, using the autonumber as the foreign key in realted tables, as then you can easily change the text string to a different value, or even a different data type, without needing to cascade the changes. It does mean that you cant necessarily find information from the related tables without joining them to the mastertable, which you could do perhaps if you used a meaningful foreign key

one other thing is that if you use a numeric key, its easy to join in query definitions - if you use segmented (ie multi field keys) it can be a pain defining realtionships in queries - and in some cases its unavoidable, as the fk is not avaialble, but the related data is (eg when importing a spreadsheet that relates to existing data)

so there's swings and roundabouts, and its a matter of taste which way you want to go.
 
perhaps if you used a meaningful foreign key
I've found that "meaningful" primary keys will at some point bite you (maybe not today and maybe not tomorrow, but it is rarer to not run into a problem with them than it is to have no problem). Better to let the SYSTEM manage the keys to its relationships and there are other ways of managing what gets put into the tables.
 
i prefer numeric strings
SAY WHAT??? Number strings??? LOL, I hope you mean normal Number fields, not numbers entered into text fields !

so there's swings and roundabouts, and its a matter of taste which way you want to go.
I will go to the moon and back before not using a meaningless number PK, I have been 'had' so many times by different things goind "this will never change thus must be our PK", only to have it change a few months later... and then again a year later...

Same thing with VAT rates, some people seem to assume they dont change (a lot). Then when they do, it turns out to be a S**t load of work to change it.
Everything than can be variable, is to be variable. Anything used in multiple places is to be caught in a function or otherwize centralized.
Trust me these rulez of development are here for a reason.... One of the rulez of normalization says the PK should be (by preference) meaningless.... save yourself a headache or two and stick to it...
 
i didnt mean numeric strings, i meant numeric fields by choice

i agree that by choice i would use numeric not text fields as pks - but not everyone does - look at Sage- UKs biggest mass market software house, and text strings as customer account codes, and probably elsewhere too.

does anyone in US use a lookup table for states with a numeric key? or do they just use standard 2 character state abbreviations as the table key?

similar thing with VAT rates - even when they do have a proper lookup table, a lot of systems use letter codes for the looked up VAT rate, not numeric - typically S=Standard Rate, Z=Zero-Rate etc.

And I bet a lot of supermarkets/large scale operations use 3-character strings to identify branches/stores, and dont have an underlying numeric master key.
 
I am sure even know a lot of bigger or smaller systems do not 'comply', still that doesnt change the nature of the good practice.
 
I am sure even know a lot of bigger or smaller systems do not 'comply', still that doesnt change the nature of the good practice.
How very true.

Short text strings however are less inefficient than longer ones simply because they take up less space per item so you get away with short codes. The main disadvantage with text PK is that they nearly always are meaningful and so subject to change as the business expands. Much better IMHO to use meaningless Autonumber PKs that are just used for linking.

I recently had to replicate a DB for use on Laptops that were used away from base. Was I glad that the autonumber Keys were meaningless as replication changes them to random numbers.

This change was not considered when the DB was originally designed but because best practice had been followed the later change was made without problems.
 
Although I am firmly in the surrogate key camp, the decision to avoid text primary keys based on performance is moot given the row counts in most Access databases. It is not until you get into hundreds of thousands of rows before you would find a measurable speed difference. Storage space is a different matter as strings tend to take more space than integers.
 
Still access tends to be a starting point for many people, not the end point. Getting into the right habits early on is important.
 
I've found that "meaningful" primary keys will at some point bite you (maybe not today and maybe not tomorrow, but it is rarer to not run into a problem with them than it is to have no problem). Better to let the SYSTEM manage the keys to its relationships and there are other ways of managing what gets put into the tables.

I agree wholeheartedly - having been bitten very badly by taking on a mature system with a meaningful primary key, upon which many layers of dependency had been imposed, at which point an urgent/mandatory need to change the structure/meaning of the key field arose. It was an absolute nightmare (in fact it was never completed - the company went out of business for other reasons while the work was still in progress)
 
i also think that insisting on an additional autonumber key can be pedantic, and adds another key for maybe no good reason

eg, in a system with orders and order lines

the order has a natural order number, which can be a candidate PK, but instead have lets have an additional (auto)number key if you insist

now the order lines table needs a foreign key relating to the orders table, which can either be the natural order number, or the autonumber key if you have one

BUT the order lines table needs its OWN unique key - which could be the order number plus a product code - or the order number plus a line number etc etc. Now adding an autonumber to the order lines table as a PK might make it easy to manage links to some other table - but it really is artificial - the real important key in THIS table is the unique real world information of order number plus product data, which MUST maintain its integrity - since you HAVE to be able to distinguish between lines on the same order.

so in this scenario, would you always add an autonumber key to the order lines table?

------------
or another example - i am currently doing a warehousing system to store pallets of products in a large warehouse with thousands of pallet sized bays. Each bay is referred to logically by a multi level hierarchy, consisting of zone, aisle, left/right side, storage bay, and stack level - some are alpha - eg left right is AB in their parlance, and stack levels are A to D, but ailses are two digit numeric.

After some thought I am not using an autonumber key, but i am treating left/right as numerically 1-2, stacks A-D as numerically 1-4, and constructing a numeric primary key that looks like

2431244, and also storing the string equivalent of 143A24D
2-43-1-12-4 (1-43-A-12-D)

representing
w/h zone 2
aisle 43
side 1 (a)
bay 12
level 4 (d)

doing it this way makes it easy to search any given location for nearby locations, based on the pattern of the primary key,
it also means they can select form their string key (in look ups) which is easy for them to do
and most importantly, I can easily have a function to go from one to the other.

although I could have used an autonumber key, and just manipulated the string to search bays, it did seem as straightforward in this instance to use a meaningful numeric key.
 
Last edited:
so in this scenario, would you always add an autonumber key to the order lines table?
Probably. Many order entry applications take additional arguments such as color or size or both so you could end up with the same item number multiple times on the order. Blanket, Blue, 1 and Blanket, Green, 1.

Your "smart" key is just the kind of problem waiting to happen. I would use a surrogate key and keep the "smart" key as a unique index. It adds a little overhead but doesn't blow me out of the water if they increase the number of bays to 100.
 
You know and just as a side thought - My job is easier than yours Gemma as I do not have to even try to analyze to look for natural keys. I always use surrogate keys so I can just quickly move forward and then I can just ask myself what can't be duplicates and can add indexes. But, I don't have to try to figure out which key to use if there are multiple candidate keys available because I already know that I am going to use a surrogate. Makes life SOOOOOO much easier.
 
in a system with orders and order lines

the order has a natural order number, which can be a candidate PK, but instead have lets have an additional (auto)number key if you insist

now the order lines table needs a foreign key relating to the orders table, which can either be the natural order number, or the autonumber key if you have one

BUT the order lines table needs its OWN unique key - which could be the order number plus a product code - or the order number plus a line number etc etc. Now adding an autonumber to the order lines table as a PK might make it easy to manage links to some other table - but it really is artificial - the real important key in THIS table is the unique real world information of order number plus product data, which MUST maintain its integrity - since you HAVE to be able to distinguish between lines on the same order.

so in this scenario, would you always add an autonumber key to the order lines table?
Does the OrderLine table really NEED its own unique key? I am not so sure it does. I always give it an Autonumber PK because until the project is complete there might be a need but in practice I have never actually used the this PK for any purpose. The Orderline records seem to come at the end of the relational chain. When I want them I want the orderline records that are related to a specific Order record and not a specific Order line record.

Basically I use Autonumber primary keys because they are simple to implement and easy to use not because they are "Good Practice". That is just a bonus:)
 
thanks for the thoughts -

My overall point is that adding an autonumber to every table guarantees unique records -but still doesnt do the job - with mostly every table, you still need other indexes to TRULY identify the real UNIQUE data. eg with an order entry table, even though you add an autonumber key, surely you will index the actual orderno field with a unique index

And then you have to consider how you work in the real world - you may have a situation where you can grab the next order no, and use that - either using a dmax, or from a central lookup table - but what if you want to enter an order number manually - say the system was down, and a day's orders were typed and sent out with manual order numbers - now you need a way to enter order numbers manually, and guarantee they are not duplicated. (and its almost certainly not safe to assume they were issued in line with the existing sequence, and definitely unreasonable to insist such orders are then re-entered in that sequence when the system comes back up) - and so as well as allowing for the entry of manual order numbers, you also need to consider how such orders may then interfere with your dmax function to get the next order number.

orders/orderlines

i dont mind having a surrogate autonumber key to the orderlines table, (I said its probably very useful later on) but it doesnt get the job done by itself - as follows. You definitely STILL need a different true/working/unique key on the order lines table, assuming you want to be able to match the goods received to the order lines. The (artificial) autonumber key (although its unique) is just no good for matching the goods received, as the supplier wont quote your autonumber - what he will do is quote your order number, and identify the product - so you still need to search on the "true" unique key to be able to relate the items correctly, and only then store the autonumber from the order lines table

and fwiw this is a project in itself, as you either need to do this matching manually, by inspection, annotate the order line in some way to indicate the product has been received etc etc (and then you get into the realm of only partial receipt etc etc) and/or (if the invoice data is received electronically) have a correspondence table that takes the suppliers product id, and maps to our own product id (unless you have dealt with this another way). And (backing up a step) in order to do this you need a way of ensuring the order doesnt allow multiple entry of the same item (assuming this is your enterprise rule) - so you need a "REALWORLD UNIQUE" key that joins orderno/productno/colour etc (as Pat pointed out) to detect any duplicated items, and reject them (or force you to deal with it somehow)

You can circumvent this, if you can find a way of adding your autonumber to the order sent to the supplier, and getting the supplier to quote this back - but now its not a meaningless number - its a MEANINGFUL number.



warehouse system -
Bob/Pat on reflection I think you are right - it probably does make sense to add an autonumber key to the locations table - which I can do easily.

Bob - it wasnt so much trying to find a natural key (your point about your job being easier) - it was a way of finding a representation of the storage location that enabled simple manipulation - andI thought it was easier to do that with a long number, than with a string. And I take your point - if the project is used on a location with more bays, or even worse, if it extends out of the range of a longint, i have problems - with an autonumber i dont - i just need to change how i manage/manipulate the internal representation, to present the location to the users, and to search for contiguous locations to the current one - which are black box functions anyway effectively - so if it came to it, i could manipulate the string instead of the long (or indeed the drop downs that drive the string/number in the first place)


What I am really trying to say (and not in a rigorous way) is that I think that although autonumbers make subsequent table manipulation efficient (eg for generating reports), and guarantee relational integrity, they are not sufficient in themselves to maintain the "TRUE" underlying operational (as opposed to relational) integrity of a system.
 
Last edited:
What I am really trying to say (and not in a rigorous way) is that I think that although autonumbers make subsequent table manipulation efficient (eg for generating reports), and guarantee relational integrity, they are not sufficient in themselves to maintain the "TRUE" underlying operational (as opposed to relational) integrity of a system.
I agree that autonumbers cannot maintain the operational integrity of a system. This is dependant on the actual business rules in a particular case and require tailoring for the system.

I must admit I misunderstood your example as I was thinking of external orders from customers rather than orders raised for incoming goods. Then there is more need to identify orderline items and a need to match them with goods delivered. Still this has produced an interesting discussion on the nature of PKs
 
ive spent too long on this, but its still the same with sales items, isnt it - you still have to relate the ordered items to stock records or despatch records, and the same principles of identification and uniqueness apply
 
or another example - i am currently doing a warehousing system to store pallets of products in a large warehouse with thousands of pallet sized bays. Each bay is referred to logically by a multi level hierarchy, consisting of zone, aisle, left/right side, storage bay, and stack level - some are alpha - eg left right is AB in their parlance, and stack levels are A to D, but ailses are two digit numeric.

After some thought I am not using an autonumber key, but i am treating left/right as numerically 1-2, stacks A-D as numerically 1-4, and constructing a numeric primary key that looks like

2431244, and also storing the string equivalent of 143A24D
2-43-1-12-4 (1-43-A-12-D)

representing
w/h zone 2
aisle 43
side 1 (a)
bay 12
level 4 (d)

doing it this way makes it easy to search any given location for nearby locations, based on the pattern of the primary key,
it also means they can select form their string key (in look ups) which is easy for them to do
and most importantly, I can easily have a function to go from one to the other.

although I could have used an autonumber key, and just manipulated the string to search bays, it did seem as straightforward in this instance to use a meaningful numeric key.

I would still have used an autonumber key, with the meaningful location information as a searchable text field - reason being, anything meaningful is potentially subject to change of meaning.

The warehouse might decide to increase the number of zones, aisles or bays in such a way that their identifying digit group needs another digit.

Or they might install bins inside some of the levels - so the location system would be:
w/h zone 2
aisle 43
side 1 (a)
bay 12
level 4 (d)
Bin 6

Or they might install a new kind of racking that for one reason or another means everything in the warehouse needs renumbering - If the location is just in a text field, the renumbering operation is a bit of a headache, but if the location field is the primary key, referenced as a foreign key in loads of child tables, it could be a complete and almost impossible nightmare.
 

Users who are viewing this thread

Back
Top Bottom