New programming standard for Access VBA

"No articles of faith here, but you do need to rationally state the advantage of using a meaningful primary key."

No, I have no need to do that. That debate has versed through all the newsgroups and webfora, in addition to other sources. All the pros and cons, both the real and the imaginary, of surrogates and natural keys has been reiterated ad nauseum. If you need this information, then a web search should provide ample possibilities of spending time reading hotheads from either side totally failing to communicate with each other ;)

In addition to Monikers explanations, for the Access community, if you search for instance CDMA and/or the microsoft.public.access... NGs, Tom Ellison is one who favours natural keys, and have given very good and elaborate explanations in support of his view.

Here are som very few threads by a quick search
http://groups.google.com/group/micr...sdbdesign/browse_frm/thread/27157ee5b706d59a/
http://www.dbforums.com/showthread.php?t=515216
http://www.pcreview.co.uk/forums/thread-2595065.php

Me I use both, and attack anyone trying to deny me that with vigour ;)

"I know that this subject is controversial so that's an even better reason to put it to bed."

So, you intend to settle the surrogate vs natural key debate once and for all <grin>

There often isn't a "right" and "wrong" answer to many issues. My aim would be to state the pro's and con's of each approach in a very clear and understandable way that everybody (even non-technical people) can instantly understand. I'm very clear on the advantages of meaningless primary keys and need to understand why others pervceive them as a disadvantage.

If you don't want to express an opinion that's fine, and if you do want to logically argue the point I'd always appreciate your input.

Best Regards


Mike Smart
 
Often times, a meaningful primary key is the best decision to make in that the meaningless primary key doesn't identify anything about the record's content. For example, where I work, one of our driver tables contains a composite key that consists of a provider's ID (the PIN), a Market_ID, and a Measure_ID. While the Market_ID and Measure_ID were arbitrarily assigned, the PIN uniquely identifies the provider and it is part of the composite key used in every other related table. (Additional tables add additional identifiers, such as an Address_ID to identify specific providers with multiple addresses.)

By using a meaningful primary key, we can deduce a lot of information. For example, most of our constituents do not know that we have assigned Market and Measure ID values, and they only provide us with the provider's PIN. Since that PIN is part of the primary key, we can drill down to the data they are looking for. By this, I mean that if we know we got a phone call from someone in one of the New York offices and they wanted information on PIN 12345, we can look up PIN 12345, narrow it down to that region, and proceed from there. If we were just using an AutoNumber and they called asking about "Smith Hospital", we could potentially have 100 Smith hospitals with little way to identify them apart.

Keep in mind that the vast majority of our users have no idea that the PIN is a major component of the primary key (much less what a primary key even is), and that's fine. However, had we not made the design decision early on to make the primary key meaningful, our jobs would be a lot more difficult for what is basically an arbitrary reason.


Dear Moniker

Thanks for taking the trouble to make these points.

With your example of Provider, Market and Measure tables with PK's ProviderID, MarketID and MeasureID.

Using the meaningful method (as in your system) the MarketID and MeasureID are autonumbers and the ProviderID is a meaningful ID number (in the example PIN 12345, Let's say this is the Provider ID for Smith Hospital for the example).

The Driver table has the composite primary key ProviderID + MarketID + MeasureID.

To find all of the Market and Measure records for PIN 12345 is a simple query filtering by ProviderID (a foreign key in the Driver table).

Now let's consider the "meaningless method".

The Provider table has a meaningless Primary key of ProviderID. Another field in the Provider table is called ProviderNumber and it has a unique index constraint.

To find all of the Market and Measure records for PIN 12345 is a simple query filtering on ProviderNumber (in the Provider table) rather than the foreign key ProviderID in the driver table.

The second solution requires one more join that the first. Is this the only advantage... or are there others I am not understanding?

Here's why I feel that meaningless keys are an advantage:-

1/ A primary key should never change and a key that contains meaningful data will always be subject to potential change.

2/ A primary key may never be Null so it will be impossible to add a record where the value of the field chosen for the primary key is not known. For example, the SSN is chosen as a primary key. The table is later used for both prospective and current employees. Dummy SSN's must be input to allow potential employee records to be added when the SSN is not yet known. Setting an AK (unique constrained index) would enforce a unique business rule but still allow the flexibility of null values (that could be needed to address a future business requirement even if they aren't now).

If I've understood your comments correctly here is the advantage described above in using a meaningful primary key. If I've misunderstood do let me know :-

1/ Reporting is easier as a join is avoided when the only information needed from another table is the value of the foreign key.

And here's a second advantage from an earlier comment in this thread:-

2/ The table will be slightly more compact because it will avoid the need for a meaningless primary key field.

****
Is that the current state of play? Any feedback (from others too) on other advantages and disadvantages (based upon Moniker's example database) would be greatly appreciated and I'll add them to the list of advantages/disadvantages. But I do request that comments are kept within the context of a real-world business system rather than computer science (as they have so far).

Best Regards


Mike Smart
 
It is my impression the function of Primary Key is one of normalisation; and as that a 'key' needing to be understood by the database management tool and important to efficiency of the system; and given that then, meaningless to the inquisitor; but critically meaningful to the underlying system.
Additionally, is it not the primary purpose of relational systems to store essential data components in order that we (as humans) might derive some form of meaningful information? IE. If one set of datum produces another would not that original set be meta data? :rolleyes:
Interestingly, what I am seeing in the real world is a store of core components (tables, entities) that all have the very same field (EG. [lrsn] is found in every table of one db I work with); and then every attribute (field) within that entity becomes a candidate key; and then I suppose, if join is made between tables on multiple fields (indeed required to get desired results), would become a composite key. Granted, I have not seen it all. But this is what I am seeing with the databases I have been around in my work. Not necessarilly what I have read in computer books. Seemingly, every relationship is potentially M:N...:eek:
 
"No articles of faith here, but you do need to rationally state the advantage of using a meaningful primary key."

No, I have no need to do that. That debate has versed through all the newsgroups and webfora, in addition to other sources. All the pros and cons, both the real and the imaginary, of surrogates and natural keys has been reiterated ad nauseum. If you need this information, then a web search should provide ample possibilities of spending time reading hotheads from either side totally failing to communicate with each other ;)

In addition to Monikers explanations, for the Access community, if you search for instance CDMA and/or the microsoft.public.access... NGs, Tom Ellison is one who favours natural keys, and have given very good and elaborate explanations in support of his view.

Here are som very few threads by a quick search
http://groups.google.com/group/micr...sdbdesign/browse_frm/thread/27157ee5b706d59a/
http://www.dbforums.com/showthread.php?t=515216
http://www.pcreview.co.uk/forums/thread-2595065.php

Dear Roy

I've read the threads you kindly provided. I don't find anything there that explains the business advantage of using meaningful keys. People are saying "use meaningless" and "use meaningful" but nobody is actually logically justifiying their standpoint. I would really like to add more advantages/disadvantages to the list in this thread but first we need to clearly identify and define what the advantages actually are. Just because a large number of people do something it doesn't neccessarily mean that it is a good thing to do.

Mike
 
While I applaud anyone's idea to help others understand how to go about designs, there are only three or four REAL rules regarding standards.

1. Determine if your company has any standards.

2. If so, follow them.

3. When in doubt, see rule 2.

4. When no rules exist, THEN you get to make some choices.

I have not tried to download your rules because, to be blunt, anyone's rules are wrong sometimes. I work with the government a lot and I can tell you right now, even their best data naming standards are trash most of the time. They make it impossible to program without the ability to touch-type.

I'll also state categorically that primary keys SHOULD NOT be autonumber if there is ever any chance that they will take on meaning outside of the guarantee of uniqueness. Because, you see, autonumbered keys take up space with no relevant meaning. If another key adds meaning, I'll take the meaningful key every time.

Let's take a parts inventory in which we have an absolute guarantee that the SKU number is unique. Autonumber? I don't need no steenkin' autonumber. (With apologies to lovers of old Humphrey Bogart movies.). If two different parts come in with the same SKU, send back the parts for corrections to their labels. In this case, the uniqueness of the SKU lets you catch someone else's error.

Let's take a compound-key situation in which the prime key is a date and, say, a store number. (OK, it's contrived.) If I make that table with an autonumber, I still need indexes on store and date. PLUS the implied index for the autonumber. Waste of space.

You use keys when it makes sense to do so. You CHOOSE keys when they support your functions. If the above date/store table is something that is referenced a lot via foreign keys, then OK an autonumber might be useful. But if this table is making the references itself i.e. IT contains a lot of foreign keys and no tables reference it via foreign key, why waste the space? Or the time?

Don't get me wrong. Standards exist precisely because without them, folks tend to go off on tangents. OK, worthy cause here. Keeping everyone on the same page. Good thing to do for large teams of programmers.

BUT, standards also tend to stifle creativity when they go too far. So my advice to you is, put standards in their proper place. They are great for beginners, great for companies that have lots of staff turnover, lousy for individuals, and often cumbersome when they are extensive enough. Don't forget that when you have externally imposed standards, for a large enough shop you just created a new overhead job called the "standards compliance" officer.

OK, if you check my reputation here, you'll find that I'm a pragmatist. I've been in the programming business 35+ years. Standards come and go. I can't get worked up about them.
 
Dear Doc Man

While I applaud anyone's idea to help others understand how to go about designs, there are only three or four REAL rules regarding standards.

1. Determine if your company has any standards.

2. If so, follow them.

3. When in doubt, see rule 2.

4. When no rules exist, THEN you get to make some choices.

I think that I'm addressing case 4 as my students need rules to give them a quality standard that will get them started. If you do read my rules, however, you'll see that the introduction broadly agrees with the above.

I would add, however, that just as all useful applications always need to be improved this is also true for most rules.

> I have not tried to download your rules because, to be blunt, anyone's rules are wrong sometimes. I work with the government a lot and I can tell you right now, even their best data naming standards are trash most of the time. They make it impossible to program without the ability to touch-type.

If you had read my guidelines though you'd know that they are not just naming standards (though suggested naming standards are included).

> I'll also state categorically that primary keys SHOULD NOT be autonumber if there is ever any chance that they will take on meaning outside of the guarantee of uniqueness.

Thank you for this comment. This is exactly the type of feedback I need.

> Because, you see, autonumbered keys take up space with no relevant meaning. If another key adds meaning, I'll take the meaningful key every time.

This is the advantage referred to in :-

2/ The table will be slightly more compact because it will avoid the need for a meaningless primary key field.

> Let's take a parts inventory in which we have an absolute guarantee that the SKU number is unique. Autonumber? I don't need no steenkin' autonumber. (With apologies to lovers of old Humphrey Bogart movies.). If two different parts come in with the same SKU, send back the parts for corrections to their labels. In this case, the uniqueness of the SKU lets you catch someone else's error.

Yes. The primary key is doubling as a unique constraint in this case. The advantages of using a unique constraint are detailed earlier in this thread as Advantage 2 (which I'll re-state at the end of this message). But I think you've identified a new argument for meaningful primary keys.

"If a unique constraint is required the overhead of an extra index is avoided as the primary key unique index doubles as a unique constraint for the meaningful key". I've added a new advantage for meaningful keys as "3" below.

> Let's take a compound-key situation in which the prime key is a date and, say, a store number. (OK, it's contrived.) If I make that table with an autonumber, I still need indexes on store and date. PLUS the implied index for the autonumber. Waste of space.

Another point that I see falling out of this is that since you cannot create a composite alternate key (only a composite primary key) the task of maintaining a unique constraint composed of two attributes becomes more difficult.

I'll include this one in advantage "3" at the end of this message.

> You use keys when it makes sense to do so. You CHOOSE keys when they support your functions. If the above date/store table is something that is referenced a lot via foreign keys, then OK an autonumber might be useful. But if this table is making the references itself i.e. IT contains a lot of foreign keys and no tables reference it via foreign key, why waste the space? Or the time?

I think you are making three points here.

* Advantage 2 of meaningful keys once more (compactness).

* A new advantage for meaningless primary keys as they will potentially execute lookups faster (in your example).

But I'm not sure that there is a time advantage, would appreciate if you can elaborate upon that.

> Don't get me wrong. Standards exist precisely because without them, folks tend to go off on tangents. OK, worthy cause here. Keeping everyone on the same page. Good thing to do for large teams of programmers.

I completely agree with you here. I'd even say that standards are essential when programming in a team environment. But once again, this isn't just a naming convention but a complete set of quality design rules and this type of standard has another use - the education of people learning to develop applications so that they start their career with a set of "good habits".

> BUT, standards also tend to stifle creativity when they go too far. So my advice to you is, put standards in their proper place. They are great for beginners, great for companies that have lots of staff turnover, lousy for individuals, and often cumbersome when they are extensive enough. Don't forget that when you have externally imposed standards, for a large enough shop you just created a new overhead job called the "standards compliance" officer.

I hate to be regulated in anything but I do realise that in society we'd have anarchy without laws. And we all hate the policeman (until he saves us from something terrible). I've seen lots of terrible databases!

> OK, if you check my reputation here, you'll find that I'm a pragmatist. I've been in the programming business 35+ years. Standards come and go. I can't get worked up about them.

I greatly appreciate you sharing your experience. I find it hard to get worked up about anything these days (except the congestion charge in London)!

So here's the advantages/disadvantages of meaningful/meaningless primary keys. If you (or anybody else) can add to the list I would be most grateful. This may not be a "right and wrong" issue but with the advantages and disadvantages clearly stated it is possible for each to choose their path. This issue seems to be clouded with faith issues though it is great to get some logical mileage out of this discussion.

Here we go then with the advantages/disadvantages so far:-

Arguments for Meaningless keys
========================
1/ A primary key should never change and a key that contains meaningful data will always be subject to potential change. While most RDBMS support cascading updates a major change in the primary key can result in the need for extensive system re-design.
2/ A primary key may never be Null so it will be impossible to add a record if the field chosen for the primary key is not known at the time of data entry. In this respect it is less flexible than an alternate key (uniquely constrained index).
3/ Because the primary key is meaningless it is possible to choose a data type for it that will optimise indexed lookups (eg the Autonumber data type in a Jet database).

Arguments for Meaningful keys
=======================
1/ Reporting is easier as a join is avoided when the only information needed from another table is the value of the foreign key.
2/ The table will be more compact because it will avoid the need for a meaningless primary key field and its index.
3/ It isn't possible to create a composite alternate key (only a composite primary key). For this reason the task of maintaining a unique constraint composed of two attributes becomes more difficult and involves the maintanance of three indexes rather than one.

Any more suggestions for this list would be gratefully appreciated. I intend to publish these advantages/disadvantages in the next version of the standard.

** I'm moving the discussion of meaningless vs meaningful primary keys to: "Theory and practice of Database Design" as I think that's a more appropriate forum for this particular issue. VBA coding related issues can stay here **

Mike Smart
 
Last edited:

Users who are viewing this thread

Back
Top Bottom