Text as Primary Key vs Auto Number

I know where you work so it is difficult to argue against you. Are you refering to Sate or Federal laws. North of the Border I have not seen any of the rules you refer to. Namely the average employer keeps a full copy of the Driver's License. National mining companies for one.
The TFN is always kept for payroll purposes. etc.

I only know what our internal auditor tells me.

Personally I find it rather bizarre that the Organ Donor status is considered so controversial. Moreover the donor status is blank if no consent is given but this need not be hidden. By inference then a mark covering the status is an indication of consent.

However I am not going to point this out because it would make more work if the auditor decides it need to be fixed. Similarly we have no need to know what kind of vehicles they are licensed to drive but I won't mention that.

An employer holding a TFN is considered legitimate.

So why are we at odds here. Surely your State is not that different to hours. Or are these rules comming down from you own Company Lawers.

I suspect it is the lawyer's prerogative.

BTW. I hope you enjoyed the footy tonight. Do you Class yourself as a Queenslander or a New South Welch Man.

Grew up in Queensland but lived far longer in NSW. Better to back Queensland for obvious reasons.

I was more fascinated by yesterday's "match" in Canberra.
 
Back to the subject at hand:

Using text for a PK works but can get long. There are efficiency issues with overly long keys. People's names tend to be unusually long, ugly keys, particularly if your ethnicity comes from Uzbekistan or Kurdistan or Chechnya where your name has entirely too many consonants when transliterated to the Roman alphabets.

But sometimes it doesn't matter so much. For example, in the USA, consider the designated abbreviations of the state names. LA for Louisiana, MO for Missouri, NY for New York, etc. These are fixed, stable, and SHORT. Therefore, it might make sense to use these natural keys as opposed to a synthetic or surrogate key.

On the other hand, there is the question of whether the key needs to be recognizable. If the answer is NO, then a synthetic key works just as well as anything else. For instance, when you do an online transaction and you get back a confirmation number, do you have ANY doubt that the number is the moral equivalent of a very long autonumber key? At least for the bank I use, I've examined that number many times and have never yet seen a pattern to it. It doesn't appear to be encoded. It is just a big, glorified autonumber PK.
 
This has been an interesting thread.

I don't know enough about US Law so I can't comment too much. I have long thought that using Autonumber to be the better PK, regardless of the contents of the the Table.

Pat I did have another read of your post. I would need more time to gain a proper understanding.

I don't believe I have much more to add to this discussion. TTFN.
 
I have long thought that using Autonumber to be the better PK, regardless of the contents of the the Table.

Sure, but the example of SSN and its privacy implications under discussion here was about its use as a key for identifying people in the business domain outside the database and not just as an arbitrary key internal to the database: in other words, SSN as a domain key (AKA business key, natural key). You stated earlier:

There is one other rule I apply with Primary Keys and that is that it is never seen by the end user.

So clearly you would not and could not use the autonumber as an alternative for SSN as the key actually in use outside the database. You would presumably use something else (and I would do too). What you designate as the primary key of that table is therefore of no great importance and has no privacy implications. What really matters is what identifiers are exposed to the outside world and how they are used.
 
Last edited:
So clearly you would not and could not use the autonumber as an alternative for SSN as the key actually in use outside the database. You would presumably use something else (and I would do too). What you designate as the primary key of that table is therefore of no great importance and has no privacy implications. What really matters is what identifiers are exposed to the outside world and how they are used.

All I said was that the Primary Keys should not be seen by the end user.
AutoNumber is good because it has no real life meaning and It can't be duplicated.
It would make no sense to use the SSN as it would be required to be displayed.
The original question here was to do with the use of Autonumber V Text as Primary Key. The discussion expanded to SSN and other Government issued identifiers. The problem with these is more a case of privacy laws, so one needs to decide what is a breach of privacy. Once it is decided that privacy is not an issues then it could be considered for use as a primary key.

I believe AutoNumber to be the best and should be used as the default.
 
All I said was that the Primary Keys should not be seen by the end user.
AutoNumber is good because it has no real life meaning and It can't be duplicated.
It would make no sense to use the SSN as it would be required to be displayed.
The original question here was to do with the use of Autonumber V Text as Primary Key.

QED. The mention of primary key in the thread subject line is certainly wrong/irrelevant (a case could be made that the phrase "primary key" is nearly always irrelevant in such discussions...) What matters to jk42 and millions of others is a key that can be relied upon and understood by users to identify information uniquely and accurately in the database. The "primary" key becomes irrelevant to that discussion once you assume it will never be seen by the people who need to use an identifier and relate it to physical, practical things in the outside world. A surrogate key that is hidden from users is clearly no substitute for the key used in the business domain - it is something altogether different and orthogonal to it. And since having a natural key is presumably still just as important ever, the advice to "use an autonumber" is IMHO not really much help (however soundly based and well-intentioned that advice may be).
 
Button Moon

What is it that makes you believe that the primary key must have some meaning to the outside world, as you like to call it.

I thought the purpose of a primary key was to uniquely identify a Record within a table.

You cannot always find a unique Natural key for every table. This forces you to consider a compensate key made up of one or more fields.

The other alternative is to use a surrogate key. AutoNumber is a surrogate key which is suitable in this situation.

I believe you are attempting to apply your knowledge gleaned from other Database designs to Access. Well intentioned but square pegs don’t always fit into round holes.
 
Another problem with using a Human readable keys is that data, which would normally be anonymous, in any other table becomes identifiable.
 
Button Moon

What is it that makes you believe that the primary key must have some meaning to the outside world, as you like to call it.

I thought the purpose of a primary key was to uniquely identify a Record within a table.

You cannot always find a unique Natural key for every table. This forces you to consider a compensate key made up of one or more fields.

The other alternative is to use a surrogate key. AutoNumber is a surrogate key which is suitable in this situation.

I believe you are attempting to apply your knowledge gleaned from other Database designs to Access. Well intentioned but square pegs don’t always fit into round holes.

The purpose of any key is indeed to identify rows uniquely. Assuming the database is intended to model some portion of reality then the users of the database will need some reliable way of relating facts recorded in the database to the real objects, events or concepts that the database describes. If the people who use and update a table can't uniquely identify the information in it or can't accurately map the facts it contains to the real world then there's probably not much chance they will be able to use the database reliably or update it accurately.

Natural keys (the key attributes that will be used in the business domain outside the database) are the things that users depend upon to identify information uniquely. If a database table fails to enforce any natural keys at all then the user is left in the unenviable position of looking at a collection of data potentially containing duplicate rows. From a data integrity perspective such databases are a failure - we've probably all encountered a few of them and had to deal with their consequences.

So what's at issue is the enforcement of uniqueness, not which candidate key of a table you choose as the "primary" one. One key per table is designated a primary key and the others are known as secondary keys or alternate keys. That doesn't make them any less important from the perspective of uniqueness and data integrity however. A surrogate key is no replacement for a natural key because if it is hidden then obviously people can't use it as an identifier and it alone won't prevent duplicate information being entered into the table.
 
Last edited:
@buttonmoon - You aren't going to convince any of us that using SSN (or similar any similar dangerous personal information) as a primary key is a good idea so if this thread is still about that it is time to agree to disagree.

Maybe I didn't express myself well enough because you are still reading into my posts things that I didn't say. I have no intention to convince anyone to use SSN as a primary key. I agree with what you've said except on one point: you claimed that using SSN as a primary key is in itself a violation of privacy. It is not. The choice of primary key really has nothing to do with privacy. Consider a table with an SSN in it. Is the presence of SSN a violation of privacy? Maybe it is or maybe it isn't depending on who has access to the data. Now suppose I tell you that SSN in that table happens to be the primary key. Does it make any difference to whether privacy is being violated? No it does not.
 
But what if the SSN then becomes used as the foreign key in another table, where it has no business to be?

What if the person record relates to a foreign national, who doesn't have an SSN, you can't have a Null value in the Primary Key?

People without a valid American SSN do exist, in fact there are billions of us. :D
 
Last edited:
But what if the SSN then becomes used as the foreign key in another table, where it has no business to be?

Then maybe the presence of SSN in that other table would be a violation of privacy. If the SSN attribute is a foreign key but not a primary key in the table then that just doubly underlines what I've already said: the violation of privacy has nothing to do with whether an attribute is a primary key or not.

What if the person record relates to a foreign national, who doesn't have an SSN, you can't have a Null value in the Primary Key?

You can't have a null in any column that doesn't permit nulls, key or not. The question of whether you should permit nulls in a column is a design decision you need to make for every attribute, not just prime attributes. If your database was designed to record information about people who didn't have SSNs then why would you choose to put them in a table with a non-nullable SSN column?
 
But if it's a table about real (including non-American) people then logically SSN must be capable of holding multiple Null entries and therefore can not be used as the Primary key.
 
All I said was that the Primary Keys should not be seen by the end user.

I can see no reason why this should be so.

If a natural candidate key is both concise and stable then it is a perfectly appropriate primary key.

Adding a surrogate key increases the indexing overheads.
 
I can see no reason why this should be so.

If a natural candidate key is both concise and stable then it is a perfectly appropriate primary key.

Adding a surrogate key increases the indexing overheads.


Galaxiom, what I wrote (Post # 17) was the Rules I apply to Myself. However after reading it again I could replace all those Rules with just one.

There is one rule I apply with Primary Keys is that it is never seen by the end users.

The same would apply to Foreign Keys.

This does not leave much of a choice does it.

With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..
 
Last edited:
With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..

No, not in most cases. Usually it makes sense to index the natural key column(s) as well because those are the columns against which users' searches and lookups will most likely be performed. Also in many products (including Jet and Microsoft SQL Server) creating an index on those columns is normally the only way to enforce the uniqueness of the natural key (in Jet and SQL Server an index is automatically created when you create a UNIQUE key constraint).

Adding a surrogate key therefore does usually increase the indexing overhead and of course it also usually increases the number of joins required in queries. I'm not saying surrogate keys are "bad" - they have advantages and disadvantages. It pays to be flexible about the use of surrogate keys and to weigh up the pros and cons in each specific case where you think a surrogate key might be advantageous.
 
Adding a surrogate key therefore does usually increase the indexing overhead and of course it also usually increases the number of joins required in queries. I'm not saying surrogate keys are "bad" - they have advantages and disadvantages. It pays to be flexible about the use of surrogate keys and to weigh up the pros and cons in each specific case where you think a surrogate key might be advantageous.
How can you say that a surrogate key which has no real meaning to the end user is required to have more than one join. Maybe I am missing something but I have never seen or used more than one Field when the Primary is Autonumber. I could only understand your statement if you were not familiar with a "Composite Index" or some call it a "Compound Index"
 
How can you say that a surrogate key which has no real meaning to the end user is required to have more than one join. Maybe I am missing something but I have never seen or used more than one Field when the Primary is Autonumber. I could only understand your statement if you were not familiar with a "Composite Index" or some call it a "Compound Index"

Because using a surrogate key sometimes requires a join where none would have been required if you'd used the natural key instead. For example a query using a surrogate key might look like this:

SELECT A.foo, B.bar
FROM foo A
INNER JOIN bar B
ON A.bar_id = B.bar_id;

but if you'd used bar as the natural key in the bar table and a foreign key in foo then you could have written it without a join at all:

SELECT foo, bar
FROM foo;

The bigger potential disadvantage already mentioned by Galaxiom is the need for extra indexes when you use surrogates. In the above example without the surrogate key the Bar table should have an index on

bar (the natural key)

But if you add a surrogate then in the bar table you would typically require at least two indexes:

bar (the natural key)
bar_id (the surrogate)

Making them a composite index instead wouldn't solve anything because that would not enforce the uniqueness constraints of either of the two attributes - a composite index would only enforce the composite superkey. In any case, whatever index structures you use, the indexing overhead of the surrogate key version is going to be larger than the version without the surrogate - on a table by table basis that is. Whether the surrogate key approach makes a more efficient indexing strategy overall (taking into account key sizes and indexes for foreign keys as well as candidate keys) is highly dependent on the population of each table and needs to be evaluated on a case-by-case basis.
 
Galaxiom, what I wrote (Post # 17) was the Rules I apply to Myself. However after reading it again I could replace all those Rules with just one.

There is one rule I apply with Primary Keys is that it is never seen by the end users.

The same would apply to Foreign Keys.

This does not leave much of a choice does it.

Simply repeating your unsubstantiated assertion in no way lends the slightest support to that assertion. Indeed it further calls into question its validity since its claimant, when pressed, has been unable to offer a single advancement on what is essentially, "I said so".

With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..

The PK, by definition, is indexed. I also indexed fields that need to be searched. There are often several.

In the case of my work environment (a small financial institution), the Member field (Long) is pervasive in almost every transaction. It is an excellent natural key that uniquely identifies the client.

If I included a surrogate key as the PK of the Member table the Member field in the Members table would still have to be indexed to provide rapid search facilities by Member. Moreover I would have to translate the the Member field from the text reports that are parsed to obtain the original data and convert them to the utterly pointless surrogate key.

All queries by Member would have to include a join to the Members table.

There is no advantage whatsoever joining to the member table by an artificial key. Hence a redundant field and its index is being maintained for no purpose. I believe your case is shot to pieces.
 
Primary key values and control characters

Hopefully not off-topic …

Where the primary key value is not auto-numbered, is it recommended to disallow control characters?

I'm particularly interested in leading control characters e.g. tab. with primary keys in MS Access 2010.

Parallel to this post, in Stack Overflow:

For a primary key value in a Microsoft Access database, should a leading control character be prevented?

Thanks for any advice.
 

Users who are viewing this thread

Back
Top Bottom