Text as Primary Key vs Auto Number

Re: Primary key values and control characters

Where the primary key value is not auto-numbered, is it recommended to disallow control characters?

The design of keys should be driven by business requirements. If you wanted a purely machine-readable key then it might be an option to use arbitrary binary data, including special characters. If human beings are expected to read and enter key values in a text box then non-displayable characters clearly aren't going to be very user-friendly or practical.
 
Re: Primary key values and control characters

… If human beings are expected to read and enter key values in a text box then non-displayable characters clearly aren't going to be very user-friendly or practical.

Agreed and understood – thanks. For now I'll use the word undesirable.

… it might be an option to use arbitrary binary data, including special characters. …

I may be taking the following statement out of context, but here goes. From Search (While You Type) in a ComboBox (harfang, Experts Exchange):

"… the Tab character, which cannot be entered in any field in Access. …"

– should that be, more accurately, something like "… the Tab character, which the Tab key alone cannot enter in any field in Access. …"?

----

As background – without going into detail: I'm troubleshooting causes and possible effects of leading and/or trailing control characters (some might say, 'special characters' or 'white space') … particularly where multiple PK values have the same lead; and where multiple entries in the same row have the same lead. I know at least one user-friendly way for users to inadvertently make entries that are not very user-friendly. Hmm.

(I'm left wondering whether in our environment, the effects are simply undesirable … or whether the effects can be more troublesome, e.g. insidious corruption. But that's not a question for this forum; it's to be discussed with a colleague, next week.)

Thanks again
 
Last edited:
Let me chime in on another part of the discussion of SSN as a key. I am now speaking regarding Federal Law and guidelines that I get from the U.S. Department of Defense plus yearly refresher courses that I have to take (and have recently taken) regarding records handling, HIPAA, the USA Privacy Act of 1974 (and as subsequently amended):

It is now forbidden to use the SSN as a computer-based primary key in any record that includes any one of about a dozen different fields, the most prominent of which is the Personal Name, but there is also Personal Address, Personal Phone, specific family information (including Mother's Maiden Name), etc. We were required to redact all repeat ALL records that used SSN as a key. We have a DoD field called the EDIPI - Electronic Data Information Personal Identifier (or something close to that). It is NOT the SSN. We are not allowed to use the SSN because it is now recognized by USA courts of law as nearly an automatic violation of privacy. So ButtonMoon, as of the latest updates in Federal Law, the SSN usage IS a privacy issue.

Now, there is a distinct question - whether SSN should be treated as a prime key? Here, my opinion is no because of the possibility of identity theft forcing changes in the SSN. There is also the fact - and this IS a fact - that some folks will need to be entered into a system while their SSN is still in the "applied for" state. Here, the real consideration is that you suddenly allowed data to drive a process rather than allowing a process to drive the data. If you used SSN for people who don't have one yet and that fact blocks their entry, then the tail is wagging the dog.

There is also the issue of whether someone should ever see a primary key value. My response there is, It depends on the key and whether knowing it gets you to something that you could use unscrupulously. Let's take a trivial case - where people use a USA-only location indicator that includes state abbreviations. You can use the two-character state abbreviation as a literal primary key (in text mode) because it fits in a fixed size slot, is stable, and pretty much doesn't matter whether someone sees it or not. Because there, the information is already widely known and easily knowable.

You face the problem again and again that the government needs a way to identify you but that information then becomes a key to allow perpetrators to ... well, PERPETRATE bad things on you. The problem is that you are stuck on the information security CIA triad - confidentiality, integrity, availability. Without using that unique identifier, you can't assure that a transaction applies only to you (confidentiality). That means that the integrity of your data can be questionable; also, gaining access to that data (availability) is an issue. So the question is, how many identifiers are enough to assure your data cannot be compromised, yet how many identifiers are too many to manage?

In the end analysis, back to the topic of keys... I've seen so many different answers to this question because opinions of "security experts" change over time and the ingenuity of the malicious hackers increases as time passes. Is it OK to use text vs. numeric keys? Whatever floats your boat works for me. Is it OK to use natural vs. surrogate keys? Whichever is better for your applications? Is it mandatory to hide primary keys from users? Depends, because first you have to decide whether that PK is used for something else. (And that question is where SSN shines as an example of the conflict of hiding vs. revealing the PK.)
 
We were required to redact all repeat ALL records that used SSN as a key. We have a DoD field called the EDIPI - Electronic Data Information Personal Identifier (or something close to that). It is NOT the SSN. We are not allowed to use the SSN because it is now recognized by USA courts of law as nearly an automatic violation of privacy. So ButtonMoon, as of the latest updates in Federal Law, the SSN usage IS a privacy issue.

You mean you must remove the SSN altogether? That's a step forward in terms of privacy I agree. Would it have made a difference if you left the SSN in the table but it was not a key? (i.e. you allowed duplicates). For my benefit, please explain in what way it is a violation of privacy to make a database column unique but no violation of privacy if it isn't unique?

To be clear, what I'm saying is NOT that you should use SSN as a key (I never said that) but that simply making a column a key doesn't make it any more of a privacy violation than if it isn't a key. If the law you refer to says otherwise then I suggest that the law is actually harming privacy by giving regulated bodies a lame excuse for using SSN unwisely when they probably shouldn't be ("Hey, your SSN was hacked, but we're in the clear because it was a non-key column in our database!").

There is also the fact - and this IS a fact - that some folks will need to be entered into a system while their SSN is still in the "applied for" state. Here, the real consideration is that you suddenly allowed data to drive a process rather than allowing a process to drive the data. If you used SSN for people who don't have one yet and that fact blocks their entry, then the tail is wagging the dog.

That does not follow at all. You seem to be assuming that all of a person's details must be entered into the same table as the SSN. In the situation you describe that would obviously be a useless design, so it is a straw man argument.
 
Doc Man and ButtonMoon

You both appear to be intelligent, experienced, educated and whatever else you would like to describe yourselves as. I mean this in a good way. In simple terms you are not idiots.

Why is so much discussion going into the use of Natural Keys like the SSN, when these keys can run into problems.

Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

Do you have examples where a Number is used for a Primary Key and things go wrong. I don't know of one. Now this is totally different where some would argue that you cannot do other things with the Key. The answer is simply Use an AutoNumber as the Primary and for nothing else. And don't forget to hide it so no one can see it.
 
Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

Do you have examples where a Number is used for a Primary Key and things go wrong.

While a surrogate key does provide the maximum potential versatility (at some cost as already discussed) I would have to disagree about Autonumber itself.

A surrogate key generated by whatever way other than Autonumber that you care to use can be easily edited or records inserted without any problems.

Access Autonumber in Increment mode is notorious for reseeding when a whole record is inserted. This can be worked around but it does expose a weakness in Autonumber. We do not have absolute control over the mechanisms controlling it.

Explicitly generating your own next number can be a safer option and it is certainly a more versatile option.
 
Why is so much discussion going into the use of Natural Keys like the SSN, when these keys can run into problems.

Why don't we just accept AutoNumber as a Primary and end the discussion.

I have seen lots of problems with the use misuse of Natural Keys but not with a Surrogate key.

The topic here is natural keys, i.e. the real world identifiers to be used for things in a database table. Adding a surrogate to a table doesn't eliminate the need for a natural key in that table. A surrogate doesn't solve any problems around the "misuse" of natural keys. Useful as they may be, surrogates are therefore irrelevant to the problem at hand.
 
RainLover, come on down to New Orleans in April. If you REALLY love rain, you'll be in your own personal brand of Heaven.

The discussion infringed on using SSN and I pointed out that as a number, it is a poor choice to be a PK for many reasons including the "tail wagging the dog" argument, which (sorry, Button, I don't see it as a straw man) has never worked well when a design required you to have something you didn't have in order to be entered into a system. I am afraid I see it at least two or three times a month in one or the other of our government systems. Not specifically SSN, but usually something you should have to gain access to some facility. I have literally seen new hires sit around doing nothing but reading manuals for a solid month while stuff gets processed, even though there would have been a way to auto-gen some temporary, limited-access account to let them in.

Also, ButtonMoon, in the case where we had to do redaction, the answer as decided above my pay grade was that leaving the SSN in place WAS a privacy violation. We had a very firm GDA (Government Directed Action) to remove the SSN where we had it. In all my code, I put the text "<REDACTED>" which takes up one less character than the punctuated form of the SSN as nnn-nn-nnnn, so it fit.

I agree with Galaxiom that Access autonumber can be a pain in the toches, but if the number can be made to fit and nobody has any expectations involving the word "continuous" or "contiguous" then it isn't a problem.
 
Also, ButtonMoon, in the case where we had to do redaction, the answer as decided above my pay grade was that leaving the SSN in place WAS a privacy violation.
That's what I'd hoped. Your superiors applied common sense and they seem to agree with what I said before: A privacy violation is caused by misusing, storing or needlessly exposing the SSN. It has nothing to do with whether it happens to be a primary key - the privacy violation is the same whether it is a key or not. Therefore raising the privacy argument (again!) in this thread is irrelevant.
 
I take it that the US gov are saying that the ssn can't be used as an index, neither primary nor secondary.

If so, then so be it. There are other data fields available. We don't have this issue in the uk, but when I have done personnel things, I have never thought of using the equivalent as a pk, a anyway.

I would probably use an index on the Ssn to ensure no duplicates, although I imagine there is a built in validity check.

however, if they allow you to wrap the ssn in a decrypt/encrypt wrapper then the problem is solved, isn't it?
 
I take it that the US gov are saying that the ssn can't be used as an index, neither primary nor secondary.
I hope not. The idea that the law should tell me what indexes I can create seems extraordinary. I can't see why that would be any concern of privacy laws.

however, if they allow you to wrap the ssn in a decrypt/encrypt wrapper then the problem is solved, isn't it?
I think the legitimate concern being expressed (perhaps in a flawed way) by legislators is really about the risks to individuals due to proliferation of their personal information - SSN in this case. Even if database security is never breached, risks are created by the fact that commercial relationships involving trust are being created based on SSN and that individuals are asked/expected to disclose their SSN for purposes for which it was never intended.
 
We are having the same type of discussion in Denmark.
For many years this number was considered a secret but in reality it was extremely easy to find out what the number was if you had a little information about the person.

As of today, it is possible to buy stuff and borrow money if you just know the number.
By now, most politicians has grasped that it is not sustainable to consider knowledge of the number the same as identifying the person however the laws have not been changed.

Looking at the whole purpose of the number it makes perfect sense to have it since it uniquely identifies a person. The problem is when it is used as the sole identification method.
If the number was public you could not rely on it as the only method of identification however you could rely on it to identify an exact person since only one person will have that number.

Businesses will complain as will many public systems but there would not be a problem in continuing to use it to single out a particular person for whatever.
 
The topic here is natural keys, i.e. the real world identifiers to be used for things in a database table. Adding a surrogate to a table doesn't eliminate the need for a natural key in that table. A surrogate doesn't solve any problems around the "misuse" of natural keys. Useful as they may be, surrogates are therefore irrelevant to the problem at hand.

Button

I thought the topic was Text V Auto Number as a Primary Key.
I always use AutoNumber and that is my choice. The reason I do so is based upon my experiences with other Access Databases where Natural Keys were the preference. In many cases a second and third field was added and then as a last resort an AutoNumber was added to form a Composite Key in order to gain some form of uniqueness. Can you imagine using 4 or more fields in order to create a unique reference. I have seen it, worked with it, and hate it to bits. If that is what you propose then that is up to you.
 
Last edited:
Access Autonumber in Increment mode is notorious for reseeding when a whole record is inserted. This can be worked around but it does expose a weakness in Autonumber. We do not have absolute control over the mechanisms controlling it.

Galaxiom

Are you suggesting that the insertion of a new record or should I say records can cause the AutoNumber to go out of control when this is done pragmatically.

Can you clarify this just a little.
 
Button

I thought the topic was Text V Auto Number as a Primary Key. I always use AutoNumber and that is my choice. The reason I do so is based upon my experiences with other Access Databases where Natural Keys were the preference. In many cases a second and third field was added and then as a last resort an AutoNumber was added to form a Composite Key in order to gain some form of uniqueness. Can you imagine using 4 or more fields in order to create a unique reference. I have seen it, worked with it, and hate it to bits. If that is what you propose then that is up to you.

So how do you expect your end users to identify the information in such a table? Your auto-number key won't help them uniquely identify anything because you've already told us you intend to hide it from them. Yet you don't appear to be suggesting any alternative. That's why I'm saying your point is irrelevant to the discussion. By introducing an arbitrary and therefore irrelevant surrogate key you aren't addressing the critically important issue of unique identification.

If a table lacks any natural key at all then users are left in the abysmal situation of working with duplicate data. Many of us are familiar with the messes created by database systems that don't support data integrity. Even when you use a surrogate key a natural key is still just as important, so introducing a surrogate doesn't in any way relieve the database designer from the responsibility of identifying and implementing the natural key for that table - i.e. you would still need to create key constraints and indexes for the natural key as well as the surrogate.
 
Well, I basically agree with Rainlover.

If there is a need for some other candidate data within that table to be functionally unique, and in my experience it tends to be a compound series of columns, then I constrain it with a unique index. I have no problem with doing this nor see any real downside to it. It's not unknown for these indexes to require tweaking from time to time as assumptions or the underlying data change and in some cases I have multiple unique indexes on a table, they're rare, but they do sometimes occur.


Users don't have to worry about duplicate data because the application handles it for them and where duplicates should not be able to occur the relevant indexes or constraints are applied to the table and ensure that it isn't an issue.

Ultimately I have no real issue with natural Primary keys, but I do think that surrogate values should always be used as foreign keys. It simplifies the design as they'll never be modified and it makes the foreign key explicit.

All my tables use a tablenameId column [as primary keys] and all foreign Key columns are named for the primary key to which they refer.

e.g. An Account table has the following columns:
AccountId
CustomerId
Date
SortCode
AccountNumber
Balance
CompanyId

Knowing nothing else about the database I'd suggest that you can infer a lot about the immediate structure of that database based on those columns and that one sentence description.

If you think that you shouldn't be able to add the same date, sortcode and Accountnumber combination to the table, try it and see what happens.

The surrogate keys from a user point of view are meaningless, they exist purely to simplify maintenance of the database and improve performance. I don't care if they're sequential, have gaps etc and, like Rainlover, I rarely expose my users to these columns, certainly not in the application layer so that there is no chance that someone will try to assign meaning to them.
 
tehNellie,

Well said. I agree with what you have said. For the reasons you have given I agree that the choice of "primary" key is essentially irrelevant and uninteresting. What really matter are the natural keys and how they are used. I.e. how is information in the database uniquely identifiable by users of that database? Are the natural keys an accurate representation of the reality the database is supposed to model?

As you point out, the choice of foreign keys is important too. Foreign keys don't have to reference the primary key of a table, nor do they necessarily have to reference a natural key. It is not implicit that using a surrogate key always improves performance however. The overuse of surrogates can have a disastrous effect on performance. Surrogates don't necessarily simplify maintenance either because they increase the number of constraints and indexes that have to be managed (because as you rightly say, the natural key usually still has to be indexed as well). Surrogates also introduce certain unique problems and complexities: e.g. the extra logic and processor cycles necessary to lookup the natural key from the surrogate or vice versa.

I'm not opposed to the use of surrogates. I just think they should be used prudently and selectively. I see no value at all in being dogmatic about "always" using a surrogate. Do what is right for each particular situation.
 
The only thing I would add (regarding tehNellie's comments) is that if the PK is going to be used elsewhere as an FK, then DEFINITELY I would make the PK a surrogate in order to simplify the implementation of Access requirements for parent/child table relationships. I.e. you must have a PK on a parent in order to link a child to its parent.

I still have no objections to using natural keys when available and appropriate i.e. stating the obvious, if it ain't a candidate key based on the rules for selecting candidate keys, don't try to make it one. I have to bang on stuff to make it fit all too often anyway. I don't need to compound my problems by mucking about with funky keys.

As far as letting users see the PK even when it is a surrogate: The question is still "So what could they do with this PK if they knew it?" If the answer is "nothing" then I don't care and will often put the PK in a write-locked text box just so they could report it to me if there was a problem.
 
Are you suggesting that the insertion of a new record or should I say records can cause the AutoNumber to go out of control when this is done pragmatically.

I have found that inserting a complete record including the autonumber field (eg restoring a deleted record from a backup) is a reliable way to cause the autonumber to reseed at the inserted record.
 
The only thing I would add (regarding tehNellie's comments) is that if the PK is going to be used elsewhere as an FK, then DEFINITELY I would make the PK a surrogate in order to simplify the implementation of Access requirements for parent/child table relationships. I.e. you must have a PK on a parent in order to link a child to its parent.

I still have no objections to using natural keys when available and appropriate i.e. stating the obvious, if it ain't a candidate key based on the rules for selecting candidate keys, don't try to make it one. I have to bang on stuff to make it fit all too often anyway. I don't need to compound my problems by mucking about with funky keys.

As far as letting users see the PK even when it is a surrogate: The question is still "So what could they do with this PK if they knew it?" If the answer is "nothing" then I don't care and will often put the PK in a write-locked text box just so they could report it to me if there was a problem.

it's not experienced developers that have the problem, is it, really.

It's inexperienced developers, and the clients themselves, who expect to see meaningful data in the tables. Probably compounded by MS who give us lookup fields to facilitate such bad practice.


bit in bold. Yes, I often find it useful to know what the "ID" value actually is. Sometimes I hide it, but sometimes, I just grey it out. Depends on the app, really
 

Users who are viewing this thread

Back
Top Bottom