Unique ID field

imperator

Archaeologist etc.
Local time
Today, 22:39
Joined
Feb 28, 2004
Messages
38
Hi all
I have been discussing DB table design with a colleague and we have a difference of opinion with regards to primary key ID fields .

My argument is that they should not be actual data but separate, that is an unique record identifier typically an autonumber type. I'm sure I picked this up from reading Access manuals.

My colleague's argument is: if the data contains a unique field, e.g. a unique project number in a table of projects (no project should be listed twice) then use the project number as the primary key. Her second argument is the byte size of an autonumber is 8 bytes whereas you can use a smaller data type to hold the project number thus saving space.

My opinion is that her first reason is flawed as it assumes no change to how projects are identified and that her second argument is irrelevant in today's Gb and Tb systems.

Without trawling through my Access books and the internet for opinions, I would like to know what your opinions are on this subject. I would appreciate good reasons which ever side of the fence you are on. I'm quite willing to be proved wrong and change my ways.
Cheers
Imperator
 
Last edited:
You are clearly correct. Artificial keys are the norm. I believe your colleague is confused about the difference between a natural primary key, which is typically implemented via a unique key, and an artificial key, which is usually implemented as a primary key (frequently system generated, like an autonumber).

I had an employee (supposedly an advanced Oracle DBA) who insisted on using the natural primary key as the table's primary key. Early on in the life of the software, a user wanted to change the value that had been used as the primary key. Since this value had been propogated to multiple tables via the foreign key(s), she had to manually make the changes to the foreign key data in all the tables it had propagated to (think about it a minute, I said tables with an "s"). I fired her shortly after that.

The database I'm managing now was also built by junior developers who didn't understand the surrogate/primary/artificial key issue. Same thing...just nobody to fire. When the user wants to change the unique key (which was designed as the primary key), I have to go into the affected tables and manually change the value.

Additionally, if the natural key is a multiple-column key and you do not define a surrogate key, the multiple columns of the key must become a foreign key set for any tables with referential integrity constraints to the original table. Further, if those foreign key columns become part of the natural key of the new table, along with more candidate columns, foreign keys in tables related will be even bigger/bulkier.

Using the natural key as the primary key is fraught with problems like this. Best to have a surrogate key on every table, along with the appropriate unique keys to define the natural key.

This is one of my favorite pet peeves.

Her second arguement is the byte size of an autonumber is 8 bytes whereas you can use a smaller data type to hold the project number thus saving space.

So what (whether it's accurate or not)? The days of needing to save space in relational databases are over. We are buying multiple hundred Giga-byte disk drives cheaply now, processors are 32 or 64 bits wide, memory is cheap and optimized for 4 or 8 byte pieces of data. If you were writing a video game for a 16 bit machine, this might be a concern (not the space but the size of the data being able to neatly fit within the bus), but most people are not doing that.

This is thinking of an old mainframe programmer. Also, this type of thinking had led to some of the biggest computer snafu's in history, including the so called Y2K bug, which was just bad thinking by a lot of programmers who thought they'd save a couple of bytes. I'm frankly glad those days are gone.
 
Thanks Bob and George, good posts. Anyone else want to jump in the ring. I'll pull up this thread and show my colleague so we can carry on the debate.
 
The gravity of this scenario has just hit me. Without doubt, someone at some point will type in the wrong value and it will have to be change. Oh, boy!

I had an employee (supposedly an advanced Oracle DBA) who insisted on using the natural primary key as the table's primary key. Early on in the life of the software, a user wanted to change the value that had been used as the primary key. Since this value had been propogated to multiple tables via the foreign key(s), she had to manually make the changes to the foreign key data in all the tables it had propagated to (think about it a minute, I said tables with an "s"). I fired her shortly after that.
 
The gravity of this scenario has just hit me. Without doubt, someone at some point will type in the wrong value and it will have to be change.
That is another good argument for the surrogate (via Autonumber) keys. They aren't entered by anyone but the computer, so there's no chance of mistyping it.
 
i dont think it matters too much personally

if you have a legitimate candidate primary key, that wont change, there is no reason for an autonumber key.

One benefit of an autonumber is that you dont need to establish a value for it, either manually or via some lookup mechanism

You want to avoid the need for cascading updates, so definitely dont pick a key that will change but eg, in an invoice table, the invoice number will be unique and would be a suitable candidate primary key (although it may not be ideal forother reasons) In a US States lookup table presumably you would only ever have the state abbreviation as the key - you wouldnt bother with an autonumber there, would you?

I think a numeric key is more efficient than a text field key, or a compound key (and easier to use than the latter)
 
In a US States lookup table presumably you would only ever have the state abbreviation as the key - you wouldnt bother with an autonumber there, would you?)

No, nor would I use a foreign key. Address elements like city, state (region), zip code (mailing code), and country are all incredibly subjective and frequently change. I have searched the world over for a database that brings order to this complex problem and can't find anything uniform. Add that to the fact that post offices change things, political boundaries frequently change throughout the world, and databases that are supposed to be national suddenly become international.

I think a numeric key is more efficient than a text field key, or a compound key (and easier to use than the latter)

Agreed. And compound (primary) keys cause huge amounts of grief to the next poor developer who has to take it over. Compound unique keys are to be expected.
 
If the PK was in a simple look up table I don't think I would be worried as much, but it's not. The database being built will record archaeological data. One of the primary keys will be a context number (a unique number assigned to an archaeological event on a site, e.g. the cut of a ditch, a wall or a fill of a pit). There will be related tables to this context number including tables for finds, photographic records, environmental sampling, site drawings, etc.. The list goes on. The context number PK will propagate to all of these tables, but I know from personal experience that people will occassionally write the context number down incorrectly on the original paper site record and will occassionally mistype that number into a database, it has happened, human nature. If it is the PK, these errors cannot be changed in one field to correct it, but will need update queries, probably for each affected table. Nightmare.

An autonumber key frees you from any problems both now and a changeable future.

if you have a legitimate candidate primary key, that wont change, there is no reason for an autonumber key.
You want to avoid the need for cascading updates, so definitely dont pick a key that will change but eg, in an invoice table, the invoice number will be unique and would be a suitable candidate primary key (although it may not be ideal forother reasons) In a US States lookup table presumably you would only ever have the state abbreviation as the key - you wouldnt bother with an autonumber there, would you?
 
I always assume somebody is gonna screw up and there will be changes to the system. That's why I want a unique, system assigned key. I'd prefer a GUID to an autonumber but I'm too lazy to implement a GUID every time. It's one button press in Access to create an autonumber and it save so many headaches later on.
 
I always assume somebody is gonna screw up and there will be changes to the system.

Yes, plus the staff selected to input data will be on the whole ignorant of Access, especially how the data are structured and related. Corrected errors will orphan related data but the inputter may not realise and leave the related tables unchanged. Such holes may not be discovered until after the project has been published, possibly with an erroneous interpretation.

Thanks to all who contributed, I have the argument clear in my head now.
Cheers.
 
Last edited:
I presented my arguments but they weren't enough to convince my colleague, she is happy to rely on cascading updates to sort out errors.
 
Just tell her the Newcastle squad of the autonumber police will be paying her a visit, any day soon!
 

Users who are viewing this thread

Back
Top Bottom