Calculated primary key recommendation needed (1 Viewer)

gunslingor · Jan 3, 2013

Ok, I have one main table called cyber assets with fields like OS, MFR, Model, device type, etc, etc... I have a bunch of lookup tables for selecting the values associated with some fields, and I have a few other tables for multivalue things like multiple IP addresses associated with the same cyber asset.

Some of these fields go into making a UNID (unique identifier) that will be placed on the field device for identification and tracking.

The UNID uses these fields in this format:
[company]-[department]-[site]-[device type]-[cabinet].[row]

if it is not in a cabinet, the device UNID will be like this:
[company]-[department]-[site]-[device type]-[identifier]

Now, these UNIDs are intended to uniquely identify a device and provide relevant data when reading them; they must be unique and must be the primary form of device identification. Therefore it follows naturally that the UNID should be the primary key... that way they are gauranteed to be unique and are guaranteed to be the primary way to identify devices.

The problem is obvious to those experienced, this violates normalization and would be a pain to program around.

I want to start a discussion regarding options. I've identfied a few, but all seemlike poor solutions to me:
1. use a calculated field: not reliable according to the interwebs; will the UNID autoupdate if I change one of the fields in either a form or the actual table? Since this calculated field would be my primary key, would child tables be updated or orphaned when the UNID automatically changes?
2. use a query to calculate the UNID based on the cyber assets table, then store this back in the table: don't think that'll update as desired.

What can I do =(?

gunslingor · Jan 3, 2013

I suspect the right thing to do is to just make the UNID a calculated field, then use an autonumber for the primary key to link other tables... the only problem with this now is that the UNIDs may not be unique and there is no way to ensure that they are without searching everytime someone makes a change to one of the child fields. Calculated fields don't appear to have a unique property that can be set in access.... and there is no way that I can determine to ensure that the individual child fields are unique to a given record (for all child fields).

So, if I do the standard normalized solution, I think I'm screwed in regards to maintaining a Unique ID.

Help

plog · Jan 3, 2013

What can I do =(?

Seperate the idea that your unique identifier in your database has to be the same unique identifier you put on your equipment. Within the table, use autonumbers as your unique ids.

Now you have to tell me why you need some hodge-podge identifier on the equipment that isn't the same as the autonumber generated by the database. Why must company, department, site, device type, capinent, row data be on everything? Why not just the autonumber?

gunslingor · Jan 3, 2013

plog said:
Seperate the idea that your unique identifier in your database has to be the same unique identifier you put on your equipment. Within the table, use autonumbers as your unique ids.

I concure, but this does present the problem that the field UNID may not be unique unless its check in programming. I have made an autonumber key for the IP and cyber assets table, I've changed the existing UNID to a calculated field... so it can never really be edited. All those child fields will be selected by unbounded text/combo boxes on a popup form, then a submit button will generate a UNID and check it against all calculated UNIDs before submitting the child fields... or something to that effect.

plog said:
Now you have to tell me why you need some hodge-podge identifier on the equipment that isn't the same as the autonumber generated by the database. Why must company, department, site, device type, capinent, row data be on everything? Why not just the autonumber?

There are a number of reasons to include relevant, instead of random, data in UNIDs. One issue is that if someone sends me an email and says "install yadayada on 34838284" I'd have to check the database to determine what the hell that is... and that's time wasted.

Another reason is database safety... this DB I'm creating is based off an excel spreadsheet in which someone accidentally sorted a single column (the autonumber column) but not the other columns... so the autonumber for every device was incorrect and since they where generated in essence randomly, there was no way to fix the data. If the autonumber had been a variable that associates with other fields, the spreadsheet would have been repairable.

In my opinion, randomness is never good when it comes to DBs and autonumbers are random.... There are a plethoria of scenarios that could cause issues with using a random number, more than I can identify and state here. If I'm an instrument machinic and tell a fresh grad to reboot all servers in the XXX room, It would be a lot easier for him if every server UNID has SVR in it.... many more.

I suppose an alternative would have been to just remove all the child fields and have those variables assigned to the UNID via code and unbound textboxes, but then the data isn't repairable and its nice to have those discrete fields to easily filter on.

Your thinking like a DB developer, I'm thinking like an instrument mechanic forman =)

plog · Jan 3, 2013

if someone sends me an email and says "install yadayada on 34838284" I'd have to check the database to determine what the hell that is... and that's time wasted

But won't you have to go into the database to tell it about the new installation?

Another reason is database safety...

...then you go into a story about how a spreadsheet worked like it was intended to work. This issue you had with excel wouldn't be an issue for a database.

In my opinion, randomness is never good when it comes to DBs and autonumbers are random

Autonumbers aren't technically random--if they were truly random they could have duplicates. Autonumbers are first and foremost unique. And that uniqueness is what helps ensure order when dealing with data.

ButtonMoon · Jan 4, 2013

gunslingor said:
Your thinking like a DB developer, I'm thinking like an instrument mechanic forman =)

You are quite right here. Incrementing numbers can make very poor keys for end users. Incrementing numbers are often used for surrogate keys (i.e. those which are not for human consumption outside the database) but are often best avoided for the identifiers used in the business domain because they are not very user friendly. Also, the engine-level "auto-incrementing" features available in some DBMSs often come with certain limitations. They sometimes can't be updated at all (not a good thing for keys exposed to end users) and can only be used once per table. That means that if you use them as a business key then you can't easily use them for a surrogate in the same table as well. If you do use auto-incrementing keys then I suggest you use them for surrogates only.

There are three sensible criteria commonly used for choosing a good key: familiarity, simplicity, stability. The concatenated key you are proposing may be familiar to you but it isn't really simple and encoding too much information is usually a bad thing from the point-of-view of stability because if the attributes thus encoded ever change then the key is either inaccurate or
must also change.

You are right to give some thought to designing a meaningful key rather than simply assuming a meaningless number will do but I suggest you rethink and come up with a more straightforward encoding scheme. Identifying a device only by its location seems like an odd strategy. Why not come up with a code based on what the device itself actually is.

gunslingor · Jan 4, 2013

ButtonMoon said:
You are quite right here. Incrementing numbers can make very poor keys for end users. Incrementing numbers are often used for surrogate keys (i.e. those which are not for human consumption outside the database) but are often best avoided for the identifiers used in the business domain because they are not very user friendly. Also, the engine-level "auto-incrementing" features available in some DBMSs often come with certain limitations. They sometimes can't be updated at all (not a good thing for keys exposed to end users) and can only be used once per table. That means that if you use them as a business key then you can't easily use them for a surrogate in the same table as well. If you do use auto-incrementing keys then I suggest you use them for surrogates only.

There are three sensible criteria commonly used for choosing a good key: familiarity, simplicity, stability. The concatenated key you are proposing may be familiar to you but it isn't really simple and encoding too much information is usually a bad thing from the point-of-view of stability because if the attributes thus encoded ever change then the key is either inaccurate or
must also change.

You are right to give some thought to designing a meaningful key rather than simply assuming a meaningless number will do but I suggest you rethink and come up with a more straightforward encoding scheme. Identifying a device only by its location seems like an odd strategy. Why not come up with a code based on what the device itself actually is.

Thanks man, I concur... you explained it better than I. I am open to alternatives and that's one of the reasons I started this discussion. If you look back at my first post, you see that there are 2 format options for the UNIDs. I recently changed this a bit so that all the tags are formed using the identifier field, rather than the rack and row field. Now, everything that is in a rack however currently gets [rack].[row] assigned to the identifier field so the existing UNIDs are maintained but in a caluclated field now. This setup, I suspect, allows for easier implementation of your last paragraph suggestion if necessary... but discussion is needed.

Let take another look at the proposed scheme:
[company]-[department]-[Site]-[devicetype]-[idenitfier...usually rack.row]
e.g.
LDP-IT-SHW-SVR-U99.33
AFP-TEL-THW-CNV-U99.33

I've choosen these fields for a reason, excluding the rack.row really. The organization I'm coming into has 3 separate companies that house network equipment in the same rooms/racks... no one really knows who owns what devices and certainly new hires like me can't determine it. That's where the company and department fields come in... i.e. people can't add a new device to the DB without defining who owns it. Plus this would help auditors, spotcheckers, and other manual inspection processes we have in place. One of my tasks is to generate network diagrams via physical walkdown (network sweeps cannot find all devices like fiber converters, so at least one physical walkdown is always recommended), but only for the telecom department of company X. This is very difficult to achieve without a label identifying the company/department, and there are other similar processes in place by others... like once a year someone walks around to ensure no devices have modems installed, which are against policy (I know, there are automated methods to do this and similar tasks... but the company isn't to that point yet)

The site field really comes into play because a lot of our tags will be identical except for this field... like, we have two identifical oracle servers at two different sites, primary and backup site... the tags of devices like these are often likely to be the same without the site field, and they should be near the same since they are identical devices.

The device type field is a natural desire that helps in many instances. Like if you have a patch panel mounted in the cabinet rear in row 30 and a small server mounted in the front at row 30. Also, when walking these devices down in the field (particularly for other less technical people), its not always easy to tell what they are and this field can come into use. Some of these sites may house devices that are 50+ years old, or odd ball devices. There are just so many vendors and implementations out there, it is useful to have this field.

Regarding the identifier field, I agree... I really don't like using the rack row. The host name wouldn't be terrible as the identifier field, but there are security issues with using an identifier that associates with electronically accessing the system (plus NOT ALL devices have a hostname). Specifically, we want to be able to send out a redacted version of the network diagram to vendors that are not security cleared... but I'd have to remove the hostname from the diagram's UNID before we sent them the drawings. So these tags need to present minimal cyber data to the users, physical data is okay per compliance implications. Another option would be to use a 4-10 character description for the Identifier field, for example the primary oracle server discussed earlier could be:
LDP-TEL-RCC-SVR-PRI_ORC

The only real problem with this is that I cannot assign the identifier myself while doing the drawings since most devices aren't tagged in a manor to imply the device function. Then some devices like patchpanels and media converters will be hard to comeup with in some cases and the identifier for these will either end up being redundant (like PNL1 or CNV1) or effectively just a random number to eliminate the redundancy.

What are your thoughts?

Calculated primary key recommendation needed (1 Viewer)

gunslingor

Registered User.

gunslingor

Registered User.

plog

Banishment Pending

gunslingor

Registered User.

plog

Banishment Pending

ButtonMoon

Registered User.

gunslingor

Registered User.

Similar threads

Users who are viewing this thread