Numerical ID vs Alphanumerical ID: practical advice needed on best way forward from where I am

adam.arca

New member
Local time
Today, 07:50
Joined
Sep 8, 2017
Messages
9
Firstly, please let me apologise if what I have written below is too much chapter’n’verse: I don’t want to fall into the trap of providing too little info and then you guys having to extract teeth to get what you need…but, in doing so, I may well have fallen into the trap of being too verbose.

Where I am
I have my own database which helps me run my micro business. It’s not very pretty and it’s missing a lot of functionality but it works for me. Nevertheless, I recently decided (I have no idea why now) to start pushing the development along.

The key data area is my customers: the main display form is bound to tblCustomers although that form does also drag in related data via various sub forms. The thing is that my customer IDs have always been alphanumeric because that mimics the customer references we use for our hard copy files.

As part of this re-development and based on what I have read here and elsewhere, I decided to create numeric IDs for the customer records (whilst retaining the alphanumeric codes but removing this field as the primary key).

So far, that hasn’t been a problem and I have happily imported tblCustomers and various smaller tables into the new back end. These are all fairly small (<500 records each typically) and I have done this via Excel where I have been able to add the new numeric codes using lookups so it’s been a painless exercise so far.

The problem
I now have 3 much bigger tables to import, all well over 10k records and all of course currently linked to the alphanumeric ID. I know that volume probably doesn’t seem that much in itself but the added complication is that each of these tables includes Memo / Long Text fields, which (I think) means that I can’t repeat the Excel trick as the narratives in those fields would be truncated doing that?

So I think I would need to import directly into the new back end and then amend the foreign key fields within Access?

Possible solutions
#1: Abandon the numeric ID and revert to using an alphanumeric one. I’ve used the database with alphanumeric IDs for 10 or 15 years without issue, so I am wondering why I’m now creating this pain for myself. What I am hoping you guys can tell me is whether or not I should grit my teeth on numeric IDs and see it through or whether, practically speaking for a low volume, low intensity database like mine, the real life answer is to live with the seemingly slight (as far as I can tell) issues caused by non-numeric IDs?

#2: Continue with numeric IDs, which would mean direct import into the new back end and then running an update query for every customer in each of the 3 tables to add the new numeric ID to every record. That would be a lot of work even for my small database but I would be happy to do that if the argument for numerical IDs is a slam dunk. Just for reference, I estimate this would be something like 600 iterations of the update query (3 tables x say 200 customers past and present), so my worry is this is very likely to go wrong at some point.

#3: As a variation on #2, how easy would it be to write a procedure in VBA which a) reads all the alphanumeric and related numeric IDs into an array, and then b) loops through the array and automatically runs the SQL for each iteration? If someone could confirm that is feasible for a lower end intermediate like me to code, then I might well crack on with conversion to numeric IDs anyway even if the argument for them isn’t that convincing.

#4: Anything else I haven’t thought about? (I imagine that could be a long list).



I do hope you’ve been able to read to the end of my War and Peace and not die of terminal boredom in the meantime. Any and all answers would be very gratefully received!
 
Before delving into anything else, you CAN and SHOULD have a separate "Customer ID" from the tblCustomer's ID field. Customer ID (your alphanumeric) is what users see. The ID in tblCustomers is an unchanging system assigned number used ONLY internally to connect your files together.

Think of it as "My customer, CB114, has a hidden "ID" inside that is just a number. That means if CB114 gets changes for any reason, all of the related records stay linked".

Your separate question would then be "I have a bunch of records I need to import from Excel. How do I add them and make sure they link up to the correct parent record?" For that question, you would want to include the table definitions for each table that you want to import, what other table(s) it relates to, and how they are related. This way we can help make sure they get put into the right spots, the right spots are set up properly, and all of it attaches together in a way that makes sense.
 
Wow, that was quick! Thanks for replying, Mark.

So you're in the numeric ID camp? I get what you're saying and it was arguments like that which made me decide to start using numeric IDs.

Regarding the follow up question, I do have a procedure for importing and which adds the new numeric IDs: it has already worked fine on my smaller tables. I could roll that out to the 3 much larger tables (it would only be an issue of scale not principle) except that I *think* I would have an issue with truncation of the narratives in my Memo data type fields.

Am I right in worrying about that and, if I am, do you have suggestions as to how to tackle that?

Cheers, again.
 
For memo fields, I normally have them in a separate table where each memo is linked to its parent by "Parent Table / Parent ID". Yes, it is a child to multiple tables. Mostly a left over because the database I'm working with now is about 16 years old and Access didn't like multiple memo's per table when it was started.

For the import, you will want to set up an "Import" application. This will be linked to both your excel files and your actual back end. This lets you see what ACCESS thinks will get imported (query on your excel files) so you can make sure you don't loose data.
 
Thanks again, Mark.

Yeah, one of the issues on my shopping list is to handle the memo fields differently; in fact, maybe quite similarly to the way you are describing. But, first, I need to get them into the new BE.

I get the feeling, when you talk about an "Import" application, that we must be straying into new territory for me. I'll go away and do some research on that. Many thanks for the steer.
 
An "Import" application is simply an access database that only links to tables. It is used ONLY to do your code for importing data for one time imports. Most often it is gotten rid of once you've got the data you need, unless you want a good reminder on HOW you did it for next time. Only reason it isn't part of your normal database.

Normal format of it is "Linked source table" and "Linked destination table" as your only tables. Then you have one query on your source (to make sure it looks right) and one query or a bit of VBA code to add to your destination. Can be a little more extensive if you need to attach values that are look up by reference (stores the ID of the lookup rather than the value), but generally you use it to bring in one table at a time.
 
This topic has been discussed many times in these Forums. I took the lazy approach and asked ChatGPT:

1. Performance Benefits
Faster indexing and searching: Numeric keys are more efficient to index and search than alphanumeric keys because numeric comparisons are simpler for the database engine to process.
Smaller index sizes: Numeric data types (like Long Integer) require less storage space than alphanumeric types (like Short Text), leading to smaller and faster indexes.

2. Simplicity and Uniformity
Ease of generation: Numeric primary keys (especially AutoNumber) can be automatically generated and incremented by the database. This eliminates the need for manual key management and ensures uniqueness.
Unambiguous values: Numeric keys avoid issues like leading/trailing spaces, case sensitivity, or other formatting inconsistencies that can occur with alphanumeric keys.

3. Referential Integrity
Numeric keys are more straightforward to use as foreign keys in related tables, making it easier to maintain referential integrity.

4. Adaptability
Numeric PKs are independent of the data content. If the alphanumeric value of a field (e.g., a name or ID code) needs to be changed, it won’t affect the relationships or integrity of the database since the numeric PK remains unchanged.

5. Cross-System Compatibility
Numeric keys are more universally compatible with other database systems and software that might consume the data. Many external systems expect numeric IDs for performance and simplicity.

6. Improved Sorting
Sorting numeric fields is faster and more predictable than sorting alphanumeric fields. Alphanumeric keys can result in lexicographical order, which might not align with natural order (e.g., "2", "10", "20" instead of "2", "10", "20").

7. Reduced Risk of Errors
Numeric keys reduce the likelihood of key duplication or human entry errors, especially if an AutoNumber is used.

Drawbacks to Consider
While numeric keys have many advantages, there are scenarios where alphanumeric PKs might be more appropriate:

Meaningful identifiers: In cases where the PK needs to convey information (e.g., an employee ID like "EMP001"), alphanumeric keys can serve a dual purpose.
Legacy systems: If your database integrates with existing systems that rely on alphanumeric keys, using alphanumeric PKs might simplify integration.
In general, if the PK does not need to hold any inherent meaning, numeric keys are the preferred choice for performance, simplicity, and maintainability.
 
I'll throw in a side comment. Depending on the size of the Alphanumeric field and the number of records, an Alphanum key is not the most efficient possible key to use as a prime key. I would recommend that you use an autonumbered long integer key that NEVER is seen by any of the potential customers. In fact, ANY TABLE that has the potential for a dependent table - a.k.a. child table - should have an autonumbered LONG PK. The indexes used by Access for parent/child links (or other cross-reference links) are most efficient in that format.

The only space-grabber about child tables is that they have to duplicate the PK of their parent as a foreign key (FK for short), and the longer the PK, the longer the FK. BUT it is also possible that the FK would be repeated many times if one parent can have many child records. Therefore, the shortest efficient key would be best for PK/FK situations. And that leads back to 4-byte (LONG) integers.

A GOOD issue is that with the PK as a LONG autonumber, the long Alphanum identifier only needs to appear ONCE in the DB. Any time an action based on CHILD tables needs that key, the FK to the parent gives you several options to look it up - in a JOIN query for SQL situations and a DLookup call for VBA situations. (This is one of the beneficial side effects of database normalization.)

At least in the interim, if you decide to regenerate your DB, keep the old keys (for a while) until you reach the point that all of your records have both a former PK and a current space-efficient PK. For any independent (parent) table, auto-gen the new PK while keeping the child tables dependent on the old key. Then another update to the FK fields (identified using the older PK for one last linkage) lets you keep things together.

Sounds complex? Not really. BUT sounds tedious? Absolutely. The good news is that it is a one-off operation. Just make good backups before you do anything to implement your possible restructuring operation.
 
I'll throw in a side comment. Depending on the size of the Alphanumeric field and the number of records, an Alphanum key is not the most efficient possible key to use as a prime key. I would recommend that you use an autonumbered long integer key that NEVER is seen by any of the potential customers. In fact, ANY TABLE that has the potential for a dependent table - a.k.a. child table - should have an autonumbered LONG PK. The indexes used by Access for parent/child links (or other cross-reference links) are most efficient in that format.

The only space-grabber about child tables is that they have to duplicate the PK of their parent as a foreign key (FK for short), and the longer the PK, the longer the FK. BUT it is also possible that the FK would be repeated many times if one parent can have many child records. Therefore, the shortest efficient key would be best for PK/FK situations. And that leads back to 4-byte (LONG) integers.

A GOOD issue is that with the PK as a LONG autonumber, the long Alphanum identifier only needs to appear ONCE in the DB. Any time an action based on CHILD tables needs that key, the FK to the parent gives you several options to look it up - in a JOIN query for SQL situations and a DLookup call for VBA situations. (This is one of the beneficial side effects of database normalization.)

At least in the interim, if you decide to regenerate your DB, keep the old keys (for a while) until you reach the point that all of your records have both a former PK and a current space-efficient PK. For any independent (parent) table, auto-gen the new PK while keeping the child tables dependent on the old key. Then another update to the FK fields (identified using the older PK for one last linkage) lets you keep things together.
Sounds complex? Not really. BUT sounds tedious? Absolutely. The good news is that it is a one-off operation. Just make good backups before you do anything to implement your possible restructuring operation.
I'll second that suggestion. Given the advantages of Surrogate Primary Keys based on the AutoNumber datatype, it's a very good idea to build relationships around them.
However, until the transition is complete, tested and deployed, it is wise to retain the current fields for "oops" events. You can always delete them a year from now when it's clear they are no longer useful.
 
Firstly, please let me apologise if what I have written below is too much chapter’n’verse: I don’t want to fall into the trap of providing too little info and then you guys having to extract teeth to get what you need…but, in doing so, I may well have fallen into the trap of being too verbose.

Where I am
I have my own database which helps me run my micro business. It’s not very pretty and it’s missing a lot of functionality but it works for me. Nevertheless, I recently decided (I have no idea why now) to start pushing the development along.

The key data area is my customers: the main display form is bound to tblCustomers although that form does also drag in related data via various sub forms. The thing is that my customer IDs have always been alphanumeric because that mimics the customer references we use for our hard copy files.

As part of this re-development and based on what I have read here and elsewhere, I decided to create numeric IDs for the customer records (whilst retaining the alphanumeric codes but removing this field as the primary key).

So far, that hasn’t been a problem and I have happily imported tblCustomers and various smaller tables into the new back end. These are all fairly small (<500 records each typically) and I have done this via Excel where I have been able to add the new numeric codes using lookups so it’s been a painless exercise so far.

The problem
I now have 3 much bigger tables to import, all well over 10k records and all of course currently linked to the alphanumeric ID. I know that volume probably doesn’t seem that much in itself but the added complication is that each of these tables includes Memo / Long Text fields, which (I think) means that I can’t repeat the Excel trick as the narratives in those fields would be truncated doing that?

So I think I would need to import directly into the new back end and then amend the foreign key fields within Access?

Possible solutions
#1: Abandon the numeric ID and revert to using an alphanumeric one. I’ve used the database with alphanumeric IDs for 10 or 15 years without issue, so I am wondering why I’m now creating this pain for myself. What I am hoping you guys can tell me is whether or not I should grit my teeth on numeric IDs and see it through or whether, practically speaking for a low volume, low intensity database like mine, the real life answer is to live with the seemingly slight (as far as I can tell) issues caused by non-numeric IDs?

#2: Continue with numeric IDs, which would mean direct import into the new back end and then running an update query for every customer in each of the 3 tables to add the new numeric ID to every record. That would be a lot of work even for my small database but I would be happy to do that if the argument for numerical IDs is a slam dunk. Just for reference, I estimate this would be something like 600 iterations of the update query (3 tables x say 200 customers past and present), so my worry is this is very likely to go wrong at some point.

#3: As a variation on #2, how easy would it be to write a procedure in VBA which a) reads all the alphanumeric and related numeric IDs into an array, and then b) loops through the array and automatically runs the SQL for each iteration? If someone could confirm that is feasible for a lower end intermediate like me to code, then I might well crack on with conversion to numeric IDs anyway even if the argument for them isn’t that convincing.

#4: Anything else I haven’t thought about? (I imagine that could be a long list).



I do hope you’ve been able to read to the end of my War and Peace and not die of terminal boredom in the meantime. Any and all answers would be very gratefully received!
Hi Adam
If you have a field in the related table with the Memo Fields which Uniquely identifies your Customer, then you can run 1 Update Query to insert CustomerID's for all records.
 
Wow, I really wasn't expecting so many replies and with so much great advice. I'm very, very grateful.

But my head hurts now and I really need to go away and have a ponder.

At least in the interim, if you decide to regenerate your DB, keep the old keys (for a while) until you reach the point that all of your records have both a former PK and a current space-efficient PK. For any independent (parent) table, auto-gen the new PK while keeping the child tables dependent on the old key. Then another update to the FK fields (identified using the older PK for one last linkage) lets you keep things together.
That's a really good tip, thanks. Obvious when you think about it but I probably wouldn't have done (and didn't do on the tables I've transferred so far).

I'm pretty sure that your database had much worse problems than an alpha vs numeric PK.
You're so right.

A more important issue would be - can the alpha value ever change? If you have been disciplined and set a rule that the alpha PK cannot change, then you didn't need to take this step to go from alpha to numeric.
As it happens, no, my alpha value never changes. Not because I knew that would be an issue for Access but just because, once I've given a customer file a reference, then I stick with it regardless of future changes. So, as I don't change the customer references, that means I never need to change the IDs on the database. That's pure serendipity and I take the point that others have been making that better practice would be to start with a hidden and meaningless numeric ID.

Given where you are now, I would probably just continue with the numeric PK rather than reworking what you have converted so far. Most people make more mistakes when they change stuff than they make when they originally create stuff so I tend to avoid unnecessary changes and this change is unnecessary. It's done. Live with it and move on.
I'm going to think on it but, actually, I'm tempted to roll back what I've changed so far and stick with my alphanumeric IDs (given that rolling back will be less work than changing the big tables I still need to tackle plus all the consequential changes to the front end). I can really get on board with your explanation and it chimes with the vague notion that I was edging towards. Not that I'm disagreeing for a micro second with any of the other advice I've been given but, for me in the right here / right now, practical is probably going to win out over correct.

An "Import" application is simply an access database that only links to tables.
Thanks again, Mark, for pointing me in the direction of something I knew nothing about. I may not get to use that functionality right now but I'll still be going away to find out more about the whole topic.
 
One other suggestion for creating conversions -- they almost never work the first time and they usually take multiple steps.

@adam.arca - Pat is spot on target with that statement. When starting conversions, do them no more than one or two at a time and make a backup of each major step along the way. When doing that backup, also throw in a Compact & Repair and a second good backup before you go on to the next step.
 
@adam.arca - Pat is spot on target with that statement. When starting conversions, do them no more than one or two at a time and make a backup of each major step along the way. When doing that backup, also throw in a Compact & Repair and a second good backup before you go on to the next step.
There is no such thing as having too many backups.
 
There is no such thing as having too many backups.

99.9% agree. But then, there was the time when my assistant and I were tasked with retiring U.S. Navy personnel-data backup tapes. Not that we were going to re-use the tapes because they were all at least 10 years old at the time and at lower density than what we were currently using. Plus showing their age in other ways.

The Navy has regulations that allow you to retire the physical media holding backups older than an amount dependent on their data classification. What happens is that after seven or ten years, the taped data officially becomes archival and removed from our systems. The oldest records are soft-transferred to other sites. Most of these tapes were SBU (Sensitive but Unclassified), though a few were actually Classified. We didn't have Top Secret or higher at our site. My assistant and I had the requisite clearances and reviewed the detailed and rather exacting procedures, so we were allowed to do this for the Navy.

We had to manually load tapes to test readability where possible (to verify tape identity from the software tape label) and then degauss any tape older than the limit. Took us at least 10 working days because of how we had to do the testing and degaussing. We had to take off our watches and cell phones when near the degausser. We generally locked the cell phones in lockboxes when dealing with the Classified tapes.

We must have retired over 300-400 tapes, which including boxing, re-labeling (to show they were being retired), logging the tape software label AND the tape's serial number on the cartridge, then boxing them up in bundles of 10 and shipping them to an approved incinerator site. There was extra paperwork for the Classified tapes - different and more complex forms, for one thing.

We couldn't transport the tapes ourselves because some were classified and we were cleared to read and manipulate but not to transport that kind of tape. Neither of us were licensed to carry a sidearm, which IS a requirement for transporting Classified or higher clearances. So we had to then sign over custody to the approved couriers.

After we finished that assignment, my assistant and I reached the consensus that we had too many backup tapes.
 
We couldn't transport the tapes ourselves because some were classified and we were cleared to read and manipulate but not to transport that kind of tape. Neither of us were licensed to carry a sidearm, which IS a requirement for transporting Classified or higher clearances. So we had to then sign over custody to the approved couriers.
Just to give you a chuckle, by postal (and military) regulations you can send SECRET document by registered mail. Depending on what is being shipped, yes the carriers will be armed.
 
Yep, @Mark_ - we could have sent things by registered mail, but the size of that class of tape cartridge and the sheer number of tapes to be sent would have cost us more than us getting the tapes to our shipping department and letting them send out the tapes to the disposal site. We already had qualified couriers because of the OTHER requirement for back-ups... that you have a set of off-line tapes at another site for the operational tapes that DID eventually get recycled for re-use later. The tapes we were decommissioning were the ones that got taken out of rotation, yet ANOTHER regulation for tape backups.
 
My two penn’orth on codes.

All of the codes in my systems ended up being linked into accounting systems sooner or later. In the main these, like SAGE, originated back in the days of MS-DOS. So eight alpha/numeric in length was the diktat. They generally were used to sort a customer file alphabetically, so a string was essential.
I advised that the first four should be used to correctly place records in order, with the last four as numeric. Then they could either have a maximum of 9,999 over all, or 9,999 per alpha character. Personally, I couldn’t imagine any of my clients with more than 999 never mind 9,999 but some had an ambitious imagination.
For instance, the Cotswold Company would need to be found in the list using COTS and not THEC and J.C.A.Alderton in ALDE and not JCAA…..etc. In the main three alpha prefix was enough but four was much preferred when the list became larger. Don’t use ASDF1, ZXCV88, QWER167 etc. Use ASDF0001, ZXCV0088 & QWER0167, otherwise, without going into detail you can get unexpected results.

TIP! If you need to amend a coding system that will affect another system by changing codes from one spec to another. Take great care before you proceed. Test, check, test check, repeat and be very sure that both you and the client agree before you do it with live data. If you slip up somewhere, the effect can be startling and never ends well.
 
I also link into other systems each with their own PK - sage, Xero, Amazon, crm, erp, etc

So within the context of my own app I use an autonumber pk and either have a separate column for the pk in the other system or if linking to many other systems a ‘PK’ table typically with 3 columns, my pk, fk to other system detail table and the pk in the other system. The system detail table contains info about the Pk, typically datatype
 
When I started using Ancestry.COM I was able to use their person-ID as my PK. However, at some point they changed internal IDs and I had to diddle my database to generate a new internal key of my own. Partly because their internal key relates to not only individuals, but also families and data sources. Their new internal IDs might make it in a QUAD integer but I have the 32-bit version of Office and for some reason, even though I have a 64-bit system, I can't choose quad integer.

Now I take their ID number as a text string which I keep in a lookup table, but all other links are 32-bit dynamically generated numbers. (I.e. my internal number assignment changes each time I have a new Ancestry download file.) This fits in with a discussion on keys because Ancestry changed the format of their numbers - normally considered a no-no, but I suspect they were faced with overflows as their list of people, families, sources, and events grew almost exponentially. And the ripple effect hit me and caused me considerable consternation. I suspect that they changed other things as well - such as physical servers and web server packages - because they changed from ANSI text to UTF-8 since I started. But that's another story.
 
Due to the nature of the work many of my client’s customers had various depots, locations, or multiple sites. Basically, in my system there could be several customers to one customer in accounts. This meant that my systems’ needed to create separate invoices by depot but post the totals of all depots to one account code. That way the depots had their own invoices to check and approve but when accounts sent out statements all of the various depot invoices were included on just one statement.
Each customer record in my systems had the customer code spec to suit their operation and also an account code to link to the accounts system. It was better this way as often an accounts department would come up with wild and wonderful codes of little use in my systems.
 

Users who are viewing this thread

Back
Top Bottom