Text as Primary Key vs Auto Number

All I said was that the Primary Keys should not be seen by the end user.

I can see no reason why this should be so.

If a natural candidate key is both concise and stable then it is a perfectly appropriate primary key.

Adding a surrogate key increases the indexing overheads.
 
I can see no reason why this should be so.

If a natural candidate key is both concise and stable then it is a perfectly appropriate primary key.

Adding a surrogate key increases the indexing overheads.


Galaxiom, what I wrote (Post # 17) was the Rules I apply to Myself. However after reading it again I could replace all those Rules with just one.

There is one rule I apply with Primary Keys is that it is never seen by the end users.

The same would apply to Foreign Keys.

This does not leave much of a choice does it.

With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..
 
Last edited:
With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..

No, not in most cases. Usually it makes sense to index the natural key column(s) as well because those are the columns against which users' searches and lookups will most likely be performed. Also in many products (including Jet and Microsoft SQL Server) creating an index on those columns is normally the only way to enforce the uniqueness of the natural key (in Jet and SQL Server an index is automatically created when you create a UNIQUE key constraint).

Adding a surrogate key therefore does usually increase the indexing overhead and of course it also usually increases the number of joins required in queries. I'm not saying surrogate keys are "bad" - they have advantages and disadvantages. It pays to be flexible about the use of surrogate keys and to weigh up the pros and cons in each specific case where you think a surrogate key might be advantageous.
 
Adding a surrogate key therefore does usually increase the indexing overhead and of course it also usually increases the number of joins required in queries. I'm not saying surrogate keys are "bad" - they have advantages and disadvantages. It pays to be flexible about the use of surrogate keys and to weigh up the pros and cons in each specific case where you think a surrogate key might be advantageous.
How can you say that a surrogate key which has no real meaning to the end user is required to have more than one join. Maybe I am missing something but I have never seen or used more than one Field when the Primary is Autonumber. I could only understand your statement if you were not familiar with a "Composite Index" or some call it a "Compound Index"
 
How can you say that a surrogate key which has no real meaning to the end user is required to have more than one join. Maybe I am missing something but I have never seen or used more than one Field when the Primary is Autonumber. I could only understand your statement if you were not familiar with a "Composite Index" or some call it a "Compound Index"

Because using a surrogate key sometimes requires a join where none would have been required if you'd used the natural key instead. For example a query using a surrogate key might look like this:

SELECT A.foo, B.bar
FROM foo A
INNER JOIN bar B
ON A.bar_id = B.bar_id;

but if you'd used bar as the natural key in the bar table and a foreign key in foo then you could have written it without a join at all:

SELECT foo, bar
FROM foo;

The bigger potential disadvantage already mentioned by Galaxiom is the need for extra indexes when you use surrogates. In the above example without the surrogate key the Bar table should have an index on

bar (the natural key)

But if you add a surrogate then in the bar table you would typically require at least two indexes:

bar (the natural key)
bar_id (the surrogate)

Making them a composite index instead wouldn't solve anything because that would not enforce the uniqueness constraints of either of the two attributes - a composite index would only enforce the composite superkey. In any case, whatever index structures you use, the indexing overhead of the surrogate key version is going to be larger than the version without the surrogate - on a table by table basis that is. Whether the surrogate key approach makes a more efficient indexing strategy overall (taking into account key sizes and indexes for foreign keys as well as candidate keys) is highly dependent on the population of each table and needs to be evaluated on a case-by-case basis.
 
Galaxiom, what I wrote (Post # 17) was the Rules I apply to Myself. However after reading it again I could replace all those Rules with just one.

There is one rule I apply with Primary Keys is that it is never seen by the end users.

The same would apply to Foreign Keys.

This does not leave much of a choice does it.

Simply repeating your unsubstantiated assertion in no way lends the slightest support to that assertion. Indeed it further calls into question its validity since its claimant, when pressed, has been unable to offer a single advancement on what is essentially, "I said so".

With regard to Indexing Overheads, which fields would you normally index.

Would not the Primary and Foreign keys be sufficent in most cases..

The PK, by definition, is indexed. I also indexed fields that need to be searched. There are often several.

In the case of my work environment (a small financial institution), the Member field (Long) is pervasive in almost every transaction. It is an excellent natural key that uniquely identifies the client.

If I included a surrogate key as the PK of the Member table the Member field in the Members table would still have to be indexed to provide rapid search facilities by Member. Moreover I would have to translate the the Member field from the text reports that are parsed to obtain the original data and convert them to the utterly pointless surrogate key.

All queries by Member would have to include a join to the Members table.

There is no advantage whatsoever joining to the member table by an artificial key. Hence a redundant field and its index is being maintained for no purpose. I believe your case is shot to pieces.
 
Primary key values and control characters

Hopefully not off-topic …

Where the primary key value is not auto-numbered, is it recommended to disallow control characters?

I'm particularly interested in leading control characters e.g. tab. with primary keys in MS Access 2010.

Parallel to this post, in Stack Overflow:

For a primary key value in a Microsoft Access database, should a leading control character be prevented?

Thanks for any advice.
 
Re: Primary key values and control characters

Where the primary key value is not auto-numbered, is it recommended to disallow control characters?

The design of keys should be driven by business requirements. If you wanted a purely machine-readable key then it might be an option to use arbitrary binary data, including special characters. If human beings are expected to read and enter key values in a text box then non-displayable characters clearly aren't going to be very user-friendly or practical.
 
Re: Primary key values and control characters

… If human beings are expected to read and enter key values in a text box then non-displayable characters clearly aren't going to be very user-friendly or practical.

Agreed and understood – thanks. For now I'll use the word undesirable.

… it might be an option to use arbitrary binary data, including special characters. …

I may be taking the following statement out of context, but here goes. From Search (While You Type) in a ComboBox (harfang, Experts Exchange):

"… the Tab character, which cannot be entered in any field in Access. …"

– should that be, more accurately, something like "… the Tab character, which the Tab key alone cannot enter in any field in Access. …"?

----

As background – without going into detail: I'm troubleshooting causes and possible effects of leading and/or trailing control characters (some might say, 'special characters' or 'white space') … particularly where multiple PK values have the same lead; and where multiple entries in the same row have the same lead. I know at least one user-friendly way for users to inadvertently make entries that are not very user-friendly. Hmm.

(I'm left wondering whether in our environment, the effects are simply undesirable … or whether the effects can be more troublesome, e.g. insidious corruption. But that's not a question for this forum; it's to be discussed with a colleague, next week.)

Thanks again
 
Last edited:
Let me chime in on another part of the discussion of SSN as a key. I am now speaking regarding Federal Law and guidelines that I get from the U.S. Department of Defense plus yearly refresher courses that I have to take (and have recently taken) regarding records handling, HIPAA, the USA Privacy Act of 1974 (and as subsequently amended):

It is now forbidden to use the SSN as a computer-based primary key in any record that includes any one of about a dozen different fields, the most prominent of which is the Personal Name, but there is also Personal Address, Personal Phone, specific family information (including Mother's Maiden Name), etc. We were required to redact all repeat ALL records that used SSN as a key. We have a DoD field called the EDIPI - Electronic Data Information Personal Identifier (or something close to that). It is NOT the SSN. We are not allowed to use the SSN because it is now recognized by USA courts of law as nearly an automatic violation of privacy. So ButtonMoon, as of the latest updates in Federal Law, the SSN usage IS a privacy issue.

Now, there is a distinct question - whether SSN should be treated as a prime key? Here, my opinion is no because of the possibility of identity theft forcing changes in the SSN. There is also the fact - and this IS a fact - that some folks will need to be entered into a system while their SSN is still in the "applied for" state. Here, the real consideration is that you suddenly allowed data to drive a process rather than allowing a process to drive the data. If you used SSN for people who don't have one yet and that fact blocks their entry, then the tail is wagging the dog.

There is also the issue of whether someone should ever see a primary key value. My response there is, It depends on the key and whether knowing it gets you to something that you could use unscrupulously. Let's take a trivial case - where people use a USA-only location indicator that includes state abbreviations. You can use the two-character state abbreviation as a literal primary key (in text mode) because it fits in a fixed size slot, is stable, and pretty much doesn't matter whether someone sees it or not. Because there, the information is already widely known and easily knowable.

You face the problem again and again that the government needs a way to identify you but that information then becomes a key to allow perpetrators to ... well, PERPETRATE bad things on you. The problem is that you are stuck on the information security CIA triad - confidentiality, integrity, availability. Without using that unique identifier, you can't assure that a transaction applies only to you (confidentiality). That means that the integrity of your data can be questionable; also, gaining access to that data (availability) is an issue. So the question is, how many identifiers are enough to assure your data cannot be compromised, yet how many identifiers are too many to manage?

In the end analysis, back to the topic of keys... I've seen so many different answers to this question because opinions of "security experts" change over time and the ingenuity of the malicious hackers increases as time passes. Is it OK to use text vs. numeric keys? Whatever floats your boat works for me. Is it OK to use natural vs. surrogate keys? Whichever is better for your applications? Is it mandatory to hide primary keys from users? Depends, because first you have to decide whether that PK is used for something else. (And that question is where SSN shines as an example of the conflict of hiding vs. revealing the PK.)
 
We were required to redact all repeat ALL records that used SSN as a key. We have a DoD field called the EDIPI - Electronic Data Information Personal Identifier (or something close to that). It is NOT the SSN. We are not allowed to use the SSN because it is now recognized by USA courts of law as nearly an automatic violation of privacy. So ButtonMoon, as of the latest updates in Federal Law, the SSN usage IS a privacy issue.

You mean you must remove the SSN altogether? That's a step forward in terms of privacy I agree. Would it have made a difference if you left the SSN in the table but it was not a key? (i.e. you allowed duplicates). For my benefit, please explain in what way it is a violation of privacy to make a database column unique but no violation of privacy if it isn't unique?

To be clear, what I'm saying is NOT that you should use SSN as a key (I never said that) but that simply making a column a key doesn't make it any more of a privacy violation than if it isn't a key. If the law you refer to says otherwise then I suggest that the law is actually harming privacy by giving regulated bodies a lame excuse for using SSN unwisely when they probably shouldn't be ("Hey, your SSN was hacked, but we're in the clear because it was a non-key column in our database!").

There is also the fact - and this IS a fact - that some folks will need to be entered into a system while their SSN is still in the "applied for" state. Here, the real consideration is that you suddenly allowed data to drive a process rather than allowing a process to drive the data. If you used SSN for people who don't have one yet and that fact blocks their entry, then the tail is wagging the dog.

That does not follow at all. You seem to be assuming that all of a person's details must be entered into the same table as the SSN. In the situation you describe that would obviously be a useless design, so it is a straw man argument.
 
Doc Man and ButtonMoon

You both appear to be intelligent, experienced, educated and whatever else you would like to describe yourselves as. I mean this in a good way. In simple terms you are not idiots.

Why is so much discussion going into the use of Natural Keys like the SSN, when these keys can run into problems.

Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

Do you have examples where a Number is used for a Primary Key and things go wrong. I don't know of one. Now this is totally different where some would argue that you cannot do other things with the Key. The answer is simply Use an AutoNumber as the Primary and for nothing else. And don't forget to hide it so no one can see it.
 
Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

Do you have examples where a Number is used for a Primary Key and things go wrong.

While a surrogate key does provide the maximum potential versatility (at some cost as already discussed) I would have to disagree about Autonumber itself.

A surrogate key generated by whatever way other than Autonumber that you care to use can be easily edited or records inserted without any problems.

Access Autonumber in Increment mode is notorious for reseeding when a whole record is inserted. This can be worked around but it does expose a weakness in Autonumber. We do not have absolute control over the mechanisms controlling it.

Explicitly generating your own next number can be a safer option and it is certainly a more versatile option.
 
Why is so much discussion going into the use of Natural Keys like the SSN, when these keys can run into problems.

Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

The topic here is natural keys, i.e. the real world identifiers to be used for things in a database table. Adding a surrogate to a table doesn't eliminate the need for a natural key in that table. A surrogate doesn't solve any problems around the "misuse" of natural keys. Useful as they may be, surrogates are therefore irrelevant to the problem at hand.
 
RainLover, come on down to New Orleans in April. If you REALLY love rain, you'll be in your own personal brand of Heaven.

The discussion infringed on using SSN and I pointed out that as a number, it is a poor choice to be a PK for many reasons including the "tail wagging the dog" argument, which (sorry, Button, I don't see it as a straw man) has never worked well when a design required you to have something you didn't have in order to be entered into a system. I am afraid I see it at least two or three times a month in one or the other of our government systems. Not specifically SSN, but usually something you should have to gain access to some facility. I have literally seen new hires sit around doing nothing but reading manuals for a solid month while stuff gets processed, even though there would have been a way to auto-gen some temporary, limited-access account to let them in.

Also, ButtonMoon, in the case where we had to do redaction, the answer as decided above my pay grade was that leaving the SSN in place WAS a privacy violation. We had a very firm GDA (Government Directed Action) to remove the SSN where we had it. In all my code, I put the text "<REDACTED>" which takes up one less character than the punctuated form of the SSN as nnn-nn-nnnn, so it fit.

I agree with Galaxiom that Access autonumber can be a pain in the toches, but if the number can be made to fit and nobody has any expectations involving the word "continuous" or "contiguous" then it isn't a problem.
 
Also, ButtonMoon, in the case where we had to do redaction, the answer as decided above my pay grade was that leaving the SSN in place WAS a privacy violation.
That's what I'd hoped. Your superiors applied common sense and they seem to agree with what I said before: A privacy violation is caused by misusing, storing or needlessly exposing the SSN. It has nothing to do with whether it happens to be a primary key - the privacy violation is the same whether it is a key or not. Therefore raising the privacy argument (again!) in this thread is irrelevant.
 
I take it that the US gov are saying that the ssn can't be used as an index, neither primary nor secondary.

If so, then so be it. There are other data fields available. We don't have this issue in the uk, but when I have done personnel things, I have never thought of using the equivalent as a pk, a anyway.

I would probably use an index on the Ssn to ensure no duplicates, although I imagine there is a built in validity check.

however, if they allow you to wrap the ssn in a decrypt/encrypt wrapper then the problem is solved, isn't it?
 
I take it that the US gov are saying that the ssn can't be used as an index, neither primary nor secondary.
I hope not. The idea that the law should tell me what indexes I can create seems extraordinary. I can't see why that would be any concern of privacy laws.

however, if they allow you to wrap the ssn in a decrypt/encrypt wrapper then the problem is solved, isn't it?
I think the legitimate concern being expressed (perhaps in a flawed way) by legislators is really about the risks to individuals due to proliferation of their personal information - SSN in this case. Even if database security is never breached, risks are created by the fact that commercial relationships involving trust are being created based on SSN and that individuals are asked/expected to disclose their SSN for purposes for which it was never intended.
 
We are having the same type of discussion in Denmark.
For many years this number was considered a secret but in reality it was extremely easy to find out what the number was if you had a little information about the person.

As of today, it is possible to buy stuff and borrow money if you just know the number.
By now, most politicians has grasped that it is not sustainable to consider knowledge of the number the same as identifying the person however the laws have not been changed.

Looking at the whole purpose of the number it makes perfect sense to have it since it uniquely identifies a person. The problem is when it is used as the sole identification method.
If the number was public you could not rely on it as the only method of identification however you could rely on it to identify an exact person since only one person will have that number.

Businesses will complain as will many public systems but there would not be a problem in continuing to use it to single out a particular person for whatever.
 
The topic here is natural keys, i.e. the real world identifiers to be used for things in a database table. Adding a surrogate to a table doesn't eliminate the need for a natural key in that table. A surrogate doesn't solve any problems around the "misuse" of natural keys. Useful as they may be, surrogates are therefore irrelevant to the problem at hand.

Button

I thought the topic was Text V Auto Number as a Primary Key.
I always use AutoNumber and that is my choice. The reason I do so is based upon my experiences with other Access Databases where Natural Keys were the preference. In many cases a second and third field was added and then as a last resort an AutoNumber was added to form a Composite Key in order to gain some form of uniqueness. Can you imagine using 4 or more fields in order to create a unique reference. I have seen it, worked with it, and hate it to bits. If that is what you propose then that is up to you.
 
Last edited:

Users who are viewing this thread

Back
Top Bottom