Best Practice: String or Number For Auto ID

agreed

on reflection i took the view that Pat expressed, and that you reiterated here, and I have modified the design to use an autonumber key for the storage locations.

i said i was concerned more with an efficient representation, and i agree that i can have that representation, while using an autonumber key
 
As one of this forum's resident pragmatists, I feel I should offer a viewpoint just to include diversity.

The first and foremost rule is to know your data. But also, it helps to think about how it is stored, and that information IS available in the help files if you weren't sure.

Meaningful keys often present themselves in a way that makes perfect sense to use them as such. Meaningless keys operating "behind the scenes" strictly as a matter of formality make no sense. ;)

Take the obvious case: State abbreviations. They are two characters long, whereas an autonumber is 4 bytes. Indexing is equally easy for two-letter fields as it is for autonumbers and is equally likely to be unique and invariant once defined. OK, the USA could add another state, but otherwise there is no reason to avoid a meaningful SHORT key.

One contrary example would be something based on, say, social security number, which in most USA locations is forbidden as a formal identifier for lookup purposes. (Not to mention impractical, since it is possible for recent, legal USA immigrants to not yet have one.)

So when DO you use meaningful keys? Agreed that you never really MUST use meaningful keys UNLESS your company has a guideline manual that spells it out as to when/where you use them. Having said that, I have no qualms about using a meaningful key when it makes sense to do it, particularly for short-code or small-table lookups.

Just review what a key is supposed to do: It is supposed to lead you absolutely and unequivocally to one specific row in the appropriate table. When screening for candidate keys, if you have one don't just ignore it pro forma. Skip it for practical reasons.

Could it change in the future? If so, don't use it. But if it could expand and you have the room, that's not a rejection excuse. (I.e. so the USA really DID add another state. We add another abbreviation to the states table and we are done.)

Is the candidate key too long to be useful for efficiency? One LONG integer is your limit there UNLESS you know for an absolute fact that the table will be incredibly sparse, maybe only a handful of entries.

Here's a poser: Even for long candidate keys, is there a significant chance that other indexes on the same table will require you to exceed the allowed number of indexes per table? (Yeah, I know, should be rare... but it happens.) In that case, consider how many JOINs we are talking about for the long, meaningful key. If it is a very small number, and particularly if the child tables are also very sparse (i.e. a smattering of optional data), you MIGHT (I emphasize MIGHT) consider it.

I never thought I would ever see a setup where more than ten indexes would be on a table, but then I went to work in a Dept. of Defense environment. HOLY GUACAMOLE! I no longer make rash assumptions about how many indexes one needs on a table until I actually analyze the requirements statements and consider the performance expected out of the system.

Anyway, the point is that for really complex situations, meaningless keys are really preferred to make it all tie together much easier - but there will always be times when a meaningful key approach still works right.

NOTE to readers who don't remember or never saw the thread: In the Design section, if you do a search for "Meaningful keys" and "Meaningless keys" and if you change the time limit to reach pretty far back, you will see several long threads on the subject that give pot-loads of opinions and viewpoints. One advantage of being the old-timer: You remember a few of the good threads.
 
I dunno - maybe I'm just getting old, but I just don't trust anyone (not meaning you, Doc Man - I mean the users who want DBs constructed) any more when they specify a meaningful key, with the assurance that it's writ in stone and will never change.

They're just wrong, most of the time (in my experience) - and the potential inconvenience of changing them in an established system far outweighs the cost of the extra storage space for a meaningless PK.

I take your point and agree with you that there probably are meaningful keys that will never change, but I'm not sure we can ever reliably identify them except in hindsight. Even things like your example of state codes could change (it's just quite unlikely).

About the only thing I can think of that I'd be happy to accept as a meaningful PK would be Atomic Numbers from the periodic table of elements (and even then, I'd still harbour a nagging doubt about the decision).
 
OK I'll bite again

all i am saying is that adding an autonumber to a table provides a way of uniquely identifying a row, and relating it to other tables. But it doesn't solve the issue of whether the row should be included in the table in the first place, and in most cases you will need to add another unique key to the table that represents the "real world" unique identifier.

eg, in the US states table - yes the US may change Texas not to be TX (it is TX isnt it) so add an autonumber key - but now you need the autonumber key AND the same state abbreviation key you had previously, as you dont want the users to be able to add Texas again - which you can if you only have the autonumber key.
 
Lets keep this real. There is a sensible case to made for using the state code as a primary key and indeed the same logic can apply to many short lookup tables. However where I think this discussion is meaningful is in defining large data tables where it is convenient to have a simple meaningless autonumber PK. I have noticed in the last few days relationship diagrams posted with upto 4 fields being used to define the link between tables. This is clearly not good design and it appears on a quick examination that a single field(already defined) would be sufficient.

Many of the great and the good on this forum have added their views to this thread and seem to be broadly agreed on the benefits of an autonumber key in most cases as opposed to a text string key especially where the test string is lengthy.
 
OK I'll bite again

all i am saying is that adding an autonumber to a table provides a way of uniquely identifying a row, and relating it to other tables. But it doesn't solve the issue of whether the row should be included in the table in the first place, and in most cases you will need to add another unique key to the table that represents the "real world" unique identifier.

eg, in the US states table - yes the US may change Texas not to be TX (it is TX isnt it) so add an autonumber key - but now you need the autonumber key AND the same state abbreviation key you had previously, as you dont want the users to be able to add Texas again - which you can if you only have the autonumber key.
I agree completely - sorry if I was talking at cross-purposes.
 
rabbie's last post
I have noticed in the last few days relationship diagrams posted with upto 4 fields being used to define the link between tables. This is clearly not good design and it appears on a quick examination that a single field(already defined) would be sufficient.

but this is EXACTLY the point

given this relationship between the 1 and many tables, you most likely STILL have to test AND USE this relationship when/before you add items to the many table - you can't just pick up the autonumber key and use that regardless, because although that may ensure referential integrity, it almost certainly will not assure you are abiding by your own business rules

[edited]
thinking about this again, i probably need to reconsider the above statement, and i think a lot of this depends also to some extent on what your system is doing. In a self contained system, there is probably no issue about using an autonumber key as the fk in the many side of any table. however there is an issue about accepting any data into the 1-side table, where the data in that table needs to have its own integrity, and therefore you need to have additional mechanisms to control that.

however, in any non-self-contained system issues most likely will arise -
eg, i am involved in quite a few systems where Access is being used to integrate and manage data introduced from other systems, by way of spreadsheet or csv, or even poorly designed odbc sources (as i am sure a lot of developers do). In these cases, even though I am loading this data to tables with autonumber keys which are then useful to me, the autonumber keys are not available in the source data, and therefore I have to analyse this data for integrity/duplicates/new items etc based on the underlying information, and in these cases the 1-many relationships have to be based on other data - such as in general the n-field links (as in your example)
 
Last edited:
Gemma in the case I was thinking of the design had A customer table, An Order table and an OrderDetail Table. \So far so good. It then had an FK from Customer stored in Order and an FK from Order stored in OrderDetail - Still no problems. Then it had an FK from Customer stored in OrderDetails - Completely redundant IMHO. Just as bad as storing calculated values. The Relation diagram was full of this type of link which to me at any rate adds nothing to the Data integrity.

I was not complaining about the need for data integrity. I was more complaining about the unnecessary proliferation of multifield links which added nothing positive but were giving the poster problems in writing his update queries.
 
Last edited:
Let's add another consideration.

Old-timers will remember that most early Windows-based machines were small-memory, i.e. < 256 Mbytes of RAM and < 20 Gbytes of disk on the C drive (if your boss liked you), sharing that space among Windows, Office, and whatever data you had. Access 2.0 still had the ability to provide autonumber keys or natural keys as needed. On systems like that, adding a surrogate key for reasons of purity was counter-productive. Space was a premium and you wanted records to be as small as possible. Every index or key was a space-eater and everything that ate up extra memory was anathema. Because the machines were still in the tens of MHz range, it was also an issue that extra manipulation of extra indexes was costly.

Now, a 256 Mbyte system with speeds less than 1 GHz means your boss really doesn't like you. Disks come in units starting at 60 Gb and it is nothing to see 120 or 240 Gb in a system. Speed of the underlying system isn't nearly such an issue as the speed of the disk. Therefore, though there is still a space penalty, it isn't nearly as severe (in relative terms) as it used to be. In this environment, a surrogate key isn't so much of a problem. Therefore, as a pragmatist, at least SOME of the objections to surrogate keys fade away.

This leaves issues of inheritance. (As in, "I inherited this pile of dyspeptic dragon droppings that doesn't have surrogate keys and now I have to fix the mess....") For real-world reasons, it might still be impractical to convert all surrogate keys at once. As I recall in my trial by inheritance, it took me 6 months to convert every table to surrogate keys because we started running into servers with multiple IP addresses and different names in the different domains (as implied by being a gateway). So nothing we had was either unique or immutable any more. Which is why I'm not totally against surrogate keys at all. I just don't require them pro forma, only when actually needed.
 

Users who are viewing this thread

Back
Top Bottom