Best Practice: String or Number For Auto ID

I dunno - maybe I'm just getting old, but I just don't trust anyone (not meaning you, Doc Man - I mean the users who want DBs constructed) any more when they specify a meaningful key, with the assurance that it's writ in stone and will never change.

They're just wrong, most of the time (in my experience) - and the potential inconvenience of changing them in an established system far outweighs the cost of the extra storage space for a meaningless PK.

I take your point and agree with you that there probably are meaningful keys that will never change, but I'm not sure we can ever reliably identify them except in hindsight. Even things like your example of state codes could change (it's just quite unlikely).

About the only thing I can think of that I'd be happy to accept as a meaningful PK would be Atomic Numbers from the periodic table of elements (and even then, I'd still harbour a nagging doubt about the decision).
 
OK I'll bite again

all i am saying is that adding an autonumber to a table provides a way of uniquely identifying a row, and relating it to other tables. But it doesn't solve the issue of whether the row should be included in the table in the first place, and in most cases you will need to add another unique key to the table that represents the "real world" unique identifier.

eg, in the US states table - yes the US may change Texas not to be TX (it is TX isnt it) so add an autonumber key - but now you need the autonumber key AND the same state abbreviation key you had previously, as you dont want the users to be able to add Texas again - which you can if you only have the autonumber key.
 
Lets keep this real. There is a sensible case to made for using the state code as a primary key and indeed the same logic can apply to many short lookup tables. However where I think this discussion is meaningful is in defining large data tables where it is convenient to have a simple meaningless autonumber PK. I have noticed in the last few days relationship diagrams posted with upto 4 fields being used to define the link between tables. This is clearly not good design and it appears on a quick examination that a single field(already defined) would be sufficient.

Many of the great and the good on this forum have added their views to this thread and seem to be broadly agreed on the benefits of an autonumber key in most cases as opposed to a text string key especially where the test string is lengthy.
 
OK I'll bite again

all i am saying is that adding an autonumber to a table provides a way of uniquely identifying a row, and relating it to other tables. But it doesn't solve the issue of whether the row should be included in the table in the first place, and in most cases you will need to add another unique key to the table that represents the "real world" unique identifier.

eg, in the US states table - yes the US may change Texas not to be TX (it is TX isnt it) so add an autonumber key - but now you need the autonumber key AND the same state abbreviation key you had previously, as you dont want the users to be able to add Texas again - which you can if you only have the autonumber key.
I agree completely - sorry if I was talking at cross-purposes.
 
rabbie's last post
I have noticed in the last few days relationship diagrams posted with upto 4 fields being used to define the link between tables. This is clearly not good design and it appears on a quick examination that a single field(already defined) would be sufficient.

but this is EXACTLY the point

given this relationship between the 1 and many tables, you most likely STILL have to test AND USE this relationship when/before you add items to the many table - you can't just pick up the autonumber key and use that regardless, because although that may ensure referential integrity, it almost certainly will not assure you are abiding by your own business rules

[edited]
thinking about this again, i probably need to reconsider the above statement, and i think a lot of this depends also to some extent on what your system is doing. In a self contained system, there is probably no issue about using an autonumber key as the fk in the many side of any table. however there is an issue about accepting any data into the 1-side table, where the data in that table needs to have its own integrity, and therefore you need to have additional mechanisms to control that.

however, in any non-self-contained system issues most likely will arise -
eg, i am involved in quite a few systems where Access is being used to integrate and manage data introduced from other systems, by way of spreadsheet or csv, or even poorly designed odbc sources (as i am sure a lot of developers do). In these cases, even though I am loading this data to tables with autonumber keys which are then useful to me, the autonumber keys are not available in the source data, and therefore I have to analyse this data for integrity/duplicates/new items etc based on the underlying information, and in these cases the 1-many relationships have to be based on other data - such as in general the n-field links (as in your example)
 
Last edited:
Gemma in the case I was thinking of the design had A customer table, An Order table and an OrderDetail Table. \So far so good. It then had an FK from Customer stored in Order and an FK from Order stored in OrderDetail - Still no problems. Then it had an FK from Customer stored in OrderDetails - Completely redundant IMHO. Just as bad as storing calculated values. The Relation diagram was full of this type of link which to me at any rate adds nothing to the Data integrity.

I was not complaining about the need for data integrity. I was more complaining about the unnecessary proliferation of multifield links which added nothing positive but were giving the poster problems in writing his update queries.
 
Last edited:
Let's add another consideration.

Old-timers will remember that most early Windows-based machines were small-memory, i.e. < 256 Mbytes of RAM and < 20 Gbytes of disk on the C drive (if your boss liked you), sharing that space among Windows, Office, and whatever data you had. Access 2.0 still had the ability to provide autonumber keys or natural keys as needed. On systems like that, adding a surrogate key for reasons of purity was counter-productive. Space was a premium and you wanted records to be as small as possible. Every index or key was a space-eater and everything that ate up extra memory was anathema. Because the machines were still in the tens of MHz range, it was also an issue that extra manipulation of extra indexes was costly.

Now, a 256 Mbyte system with speeds less than 1 GHz means your boss really doesn't like you. Disks come in units starting at 60 Gb and it is nothing to see 120 or 240 Gb in a system. Speed of the underlying system isn't nearly such an issue as the speed of the disk. Therefore, though there is still a space penalty, it isn't nearly as severe (in relative terms) as it used to be. In this environment, a surrogate key isn't so much of a problem. Therefore, as a pragmatist, at least SOME of the objections to surrogate keys fade away.

This leaves issues of inheritance. (As in, "I inherited this pile of dyspeptic dragon droppings that doesn't have surrogate keys and now I have to fix the mess....") For real-world reasons, it might still be impractical to convert all surrogate keys at once. As I recall in my trial by inheritance, it took me 6 months to convert every table to surrogate keys because we started running into servers with multiple IP addresses and different names in the different domains (as implied by being a gateway). So nothing we had was either unique or immutable any more. Which is why I'm not totally against surrogate keys at all. I just don't require them pro forma, only when actually needed.
 

Users who are viewing this thread

Back
Top Bottom