difference between candidate key and primary key?

99shassan

New member
Local time
Today, 11:51
Joined
Jun 16, 2006
Messages
6
Hey guys I'm a noob as you can tell lol.

I tried googling the answer but the results are really confusing and when I try and understand it just seems like they are the same thing.

Can someone please explain this in easier to understand terms?

Also, a superkey is basically a column that can be used to recognise a certain row right? when you identify 1 superkey, its called a minimal superkey and that is also the candidate key right?
 
A normalized dataset has tables that hold data about a particular entity relevant to the purpose of the database. So, e.g., a company with an employee database has a table of employees and might have another table of employee action history (promotions, reviews, salary adjustments, etc.)

A key is a candidate key if it qualifies to uniquely select a specific row in the table such that the data in that table depend completely on that key and nothing else is required.

Suppose the employee table is USA based. It might have two candidates - one might be an employee number, the other the employee's social security number. If a company requires employees to have SSNs..., this could happen.

OK, two candidates - employee number and SSN.

You must pick one to be the primary key, in which case the other becomes just another data field dependent on the choice of key.

In the USA, based on two factors, employee number would probably be picked as the PK leaving SSN as a secondary field. First, there is a legal restriction on use of an SSN because it is personally-identifying information that promotes identity theft. Second, it is possible that a person might have applied for an SSN but not yet gotten it. So SSN can't be relied upon, thus invalidating it as a candidate.

Therefore, the difference between primary keys and candidate keys is that the other candidates lost the race.

A SUPERKEY is essentially an overdetermined key, one that is guaranteed to be a unique selector - but that isn't the smallest it could be. In my employee table example, the combination of (employee number, SSN) would be a superkey. Note that the superkey will often include the primary key.

There is more to it than that. The Wikipedia definition speaks of a superkey that has the same cardinality as that of the whole record as a key. (The ultimate key, totally impractical since you have to know the whole record beforehand to use it as a key...)

"Cardinality" simply says, when I query a table on a single key value, what is the average number of returned records. For a proper prime key, this is always and only 1 for all values that exist in the table at all. (Obviously, it is zero for numbers that don't exist in the table.) Cardinality will be greater than one for non-unique keys. The usual example for the other end of the spectrum is for a table of size N, using "Gender" as a key will give you a cardinality of N/2. Remember, since it is the AVERAGE return size, the table's distribution doesn't have to be perfect.

Hope that helps.
 
so basically a candidate key is a unique identifier that could be used as a primary key? which when you choose the primary key, the rest will then just be data fields.

and a superkey is pretty much all the unique indentifiers?
 
Last edited:
and a superkey is pretty much all the unique indentifiers?

Not quite. A superkey can contain non-unique identifiers. But like a compound primary key, the combination must be unique.
 
A further fine point.

Technically, ALL fields in a table start out as candidate keys. At some point you realize, "Oh, darn it, this key isn't unique. Not a candidate any more." So in SOME viewpoints, a candidate key isn't that much of a distinction.

The purists will say that the key has to make it past the uniqueness test to be a viable candidate. But where a compound primary key is involved, there might be NO viable candidates for that test, in which case the fields that might be part of the compound key are candidates again.

The purists and theorists of the world will disagree with me on this one - but in practice, I think of a candidate key as merely a field you thought about as a possible key. I know that there are definitions out there that say this is not a valid viewpoint. But to me, if I thought about it as a candidate for being my PK for more than a couple of minutes, it is a candidate key.
 
A bit late, but...
A primary key is a candidate key. There is no difference. By common convention one candidate key is designated as a "primary" one and that key is used for any foreign key references. In practice however, that's really a point of convenience for database developers and it doesn't make a primary key fundamentally different from any other candidate key.

A further fine point.

Technically, ALL fields in a table start out as candidate keys. At some point you realize, "Oh, darn it, this key isn't unique. Not a candidate any more." So in SOME viewpoints, a candidate key isn't that much of a distinction.

No, because a candidate key is a set of attributes that is both unique and irreducible within a relation, i.e. a minimal superkey. Minimal/irreducible means that if you remove any one attribute then what's left would no longer be a superkey. It's true that the entire set of attributes of a relation is always a superkey because relations have distinct tuples. That doesn't necessarily mean the entire set of attributes is a minimal superkey (although it certainly must include at least one minimal superkey).
 
dp -
are you realy new here, or just a very infrequent poster. I recall you contributing most extensively to previous discussions on natural/autonumber keys - or was that on UA?
 
Dave,

I registered here in April 2009 and have 49 posts according to my profile. I think that makes me an infrequent poster. I reopened this thread only because it appears second from the top when I Google for Candidate Key. So I thought it deserved a better explanation.
 
A primary key is a candidate key.

Dave, a primary key IS a candidate key - that won the election. Unless you synthesized a surrogate key of some kind, all but one of the "natural" candidate keys is NOT a primary key - though it might still be a unique (NODUPS) key.

To me, a non-minimal superkey contains more data than needed to select a record uniquely. A minimal superkey is just a candidate key. And the prime key is just the candidate key that you picked for the job.
 

Users who are viewing this thread

Back
Top Bottom