I'll throw in my opinion on one point or so. For more study, simply google "surrogate vs. natural keys". Personally, I'd suggest googling a wider population than just AWF, because by including larger RDBMS you get a lot of viewpoints and input that is valuable and more than just the Access community--but still useful points that will relate to doing things in Access.
Is it better to use a data field for the primary key, provided there are no duplicates in the field, rather than use the autonumber ID that Access adds for you if you don't identify a primary key?
No. Use a key with no business meaning whatsoever. Access's AutoNumber is perfect. Doing anything but this just drags you down a hundred rabbit holes. No matter how much you
think a value will be unique and never be edited, the reality is that in some % of those times, you really don't know what the future holds or aren't foreseeing all the possibilities. Imagine somebody in HR designing a table for employees. He feels positive that he will never enter a record with the same employee ID twice. Then someone leaves and comes back later. Someone from another system or higher up says to use the same employee ID #, but change other attributes. Maybe this creates a problem. Probably it would have been best to allow for this previously unforeseen possibility. Oops, should have used a surrogate PK. Often times what drives the "I think this will always be unique" is a perspective limited by the inability to see a bigger picture and upstream or downstream systems. No matter how unlikely you think it is, why take the chance at all...ever? Imagine the stress you'll feel when someday the unique paradigm is blown up by new information, and suddenly a fundamental re-design of large proportions is your weekend rush job. With the ability to create your design in such a way that relationships are enforced through KEY values, and uniqueness is enforced through indexes (and/or other methods), there is no such thing as a "need" for a natural key, so why create a risk where none needs to exist.
Sample scenario I actually saw happen with a colleague when working for a gov't taxing agency:
- John is told to design tables to import tax returns on a daily basis as they come down from an upstream corporate system.
- John considers using SSN as a natural key, but thinks about it a while and realizes he'll need SSN + TaxYear.
- John has some import problems. Upon troubleshooting, he realizes he forgot about amended returns...Oh yes, of course! I'll change the natural key to SSN + TaxYear + Form number [1040, 1040x, etc]. John spends the weekend working overtime to correct everything - which involves a lot of remediation involving removing and replacing indexes, and makes several data mistakes in the process.
- A few weeks later, John has more import problems. John realizes that occasionally a taxpayer actually files more than one amended return! He wouldn't have 'guessed' this, but another weekend of emergency work fixes the problem. The natural key is now SSN + TaxYear + Formnumber + DateReceived. At this point John is questioning everything, but the solution still isn't apparent to him.
- Later on, more import problems. What could it possibly be this time?? He finds out that in certain "exception" circumstances, the same tax return will be processed twice - once by the system, and again manually to correct certain flagged errors. In these unique circumstances, the upstream application from him will actually send him the exact same return, twice, on the same date. This was something he theoretically could have foreseen with enough investigation up-front, but his time in requirements gathering had limits, and unfortunately he missed this. Turns out there is a unique attribute "Corrected", which takes a 1 or a 0.
- John considers once again re-designing his natural key to include the "Corrected" column, but soon realizes that the safest course of action is to create a surrogate key with no embedded meaning. He spends two long weekends redesigning indexes and keys, updating records on primary and foreign key relationships, creating the necessary constraints and removing others and makes a number of mistakes in the process. In the future, after conferring with colleagues to explain all the negative impacts from these incidents, he decides to use surrogate keys from now on and not tempt fate.
...and I haven't even mentioned how long it took John to find out about the process of assigning non-SSN taxpayers
TID's, which started out as a dummy number (inside this particular processing scenario), BEFORE getting a unique one.
Your business users don't have to know anything about these surrogate PK and FK keys which are being used behind the scenes to do most of the joins. Further, joins on a datatype used by AutoNumber are the most performant kind, while most creatively derived natural keys simply won't be able to use a numeric datatype.
Some common "but then I can't" 's:
- Yes, you can still give business users something they like to call a "key" if you want, made up with some embedded business meaning. It will be additional to your actual, database-level, PK. Just don't leave the meeting guaranteeing them on your part that it will "always be unique", unless you actually HAVE created a constraint that proves this, whether on the table side or through input programming. Never, ever, agree with business users who swear that such-and-such creative combination of things will always be unique on the sole basis of data components alone. I say this in the context of, for the purpose of creating a key.
- You can still enforce uniqueness by way of indexes on column(s)
However, I'm now leaning towards not storing rate nor hours * rate. Instead I'll maintain start and end dates in the rate table and use them to derive the applicable rate when calculating cost of the hours in a query
That seems like a good idea. I've seen a lot of this kind of thing done in corporate warehouses when it comes to healthcare eligibility segments. It will cost just a bit more effort on the querying side, but probably worth it to avoid storing unnecessary and potentially change-able data as if it were static.