Disconnected Data.

Thales750

Formerly Jsanders
Local time
Today, 07:44
Joined
Dec 20, 2007
Messages
3,610
I have a philosophical question

My customer has 7 locations, and we are going to build and install a Point of Sales system in each one. They will connect to either a MySQL backend or a SQL Server, we haven’t decided that yet.

Each store will be storing data in Jet, SQL Express, or MySQL and using a VPN to the backend will “update” data periodically. The point of this is that if the VPN is down, with the exception of Credit Card Processing, store operation will continue.

This proposes a little question of how to engineer the primary keys, that are store specific, to remain unduplicated across the network.
I thought of three schemas: (The system produces about a half a million records per year.)

1. Separate the Stores by ten million or so, that’s the way the previous system works. But they created a one million record separation and soon it would have crashed if they had not decided to replace it.

2. The second would be to concatenate a store Identifier onto the primary keys.

3. The third would be to use a two field Primary key, normally I would not consider this but it would actually be very consistent.

What do ya’ll think?
 
I would not use the concatenated system. This would be a string and hence very slow to process for joins. Furthermore extracting the location would require reading every record since the index cannot operate on the partial string.

I expect you will have needs for reporting by location. This would be inefficient for both reasons above.

At least using a range for location would keep the key as an integer and the branch could be extracted from the index by range. Note however that if you use the range as well as a field to indicate the location you would technically be in breach of normalization.

I think the composite key is probably the most sensible and reliable option. Even though this means many tables will include a field for location this would allow you to easily query any table by location without requiring a join.

Some developers shun composite keys but I have never yet heard a critique that provided any reasoning whatsoever for that attitude. Those with this philosophy also often promote the concept that every table must have a primary key with some going so far as to say that key should be an autonumbered integer. Since these beliefs make no sense I don't lend any support to eschewing composite keys either though I am always willing to listen to sensible explanations.
 
I would not use the concatenated system. This would be a string and hence very slow to process for joins. Furthermore extracting the location would require reading every record since the index cannot operate on the partial string.

I expect you will have needs for reporting by location. This would be inefficient for both reasons above.

At least using a range for location would keep the key as an integer and the branch could be extracted from the index by range. Note however that if you use the range as well as a field to indicate the location you would technically be in breach of normalization.

I think the composite key is probably the most sensible and reliable option. Even though this means many tables will include a field for location this would allow you to easily query any table by location without requiring a join.

Some developers shun composite keys but I have never yet heard a critique that provided any reasoning whatsoever for that attitude. Those with this philosophy also often promote the concept that every table must have a primary key with some going so far as to say that key should be an autonumbered integer. Since these beliefs make no sense I don't lend any support to eschewing composite keys either though I am always willing to listen to sensible explanations.

I agree, and well written, by the way.

I have been wracking my brain trying to figure why I wouldn't use a composite key.

Having said that the range method does seem to be the least amount of effort though. Simply run a query to append a record at install and it auto counts up from there.
 
Simply run a query to append a record at install and it auto counts up from there.

Actually I would be a little nervous about relying on an Incrementing Autonumber to always behave itself. I have experienced them suddenly deciding to continue on from unexpected starting points.

Wise counsel suggests that one should never rely on an autonember for anything that affects what is seen by the user and technically the location is embeded in that number. I would rather keep in control of something so important.

Why do you fret so much about using composite keys?
 
I guess it has something to do with a performance problem nagging at my head.

If a string as a key slows down queries, than what effect will a double key present? now it does keep it all normal, because you must have a store identifier separate from any foreign key considerations.

Omg my brain is hurting, I just got side tracked thinking about what data I should download to the stores. That of course is another thread.
 
If a string as a key slows down queries, than what effect will a double key present? now it does keep it all normal, because you must have a store identifier separate from any foreign key considerations.

There is no reason to seperate the Store field from the key. The Store field is a natural partial key. The Primary Key is a composite natural/synthetic combination.

It only needs to be a byte datatype so it doesn't add to much overhead to the index.

In fact the presence of the Store field in every related table is not a bad thing for queries.

Note however you won't be able to Autonumber the synthetic part of the key as this would cause a conflict when the data is merged.

Avoid using the popular DMax technique for the numbering as it is not multiuser safe. The larger the databse becomes the greater the chance of allocating duplicates and the slower the whole process becomes.

Store the next number in a one record table. Open it as a recordset with a lock to prevent other users accessing it. Then increment the value and release the lock as the recordset is closed.

You will need to set up delay and retry so that users how try to access the locked record will not get errors.
 
This code will create a concatenated key that is unique. I have a field that keeps the number portion separate. The seperate number field is handy for sorting. String data wants to sort CAT-1, CAT-11, CAT-110. Consider using a a dmax() on the number field and add one for the next number.
sCatId = "CAT-" & Right(MyNumber, Len(MyNumber) - InStr(1, MyNumber, "-"))
 
Last edited:
Hi

Without disagreeing with the sentiment of the advice, there are just a couple of little things I'd feel better about mentioning.

>> This would be a string and hence very slow to process for joins
Yes, short fixed length data types (i.e. the 4 bytes of a Long Vs what 20+ for a sufficiently long Text) are going to present better join efficiency - but other than the length, "very slow" is perhaps overstating it a bit. Text type PKs do exist - they just have that slight disadvantage. (I recall reading an article some time ago how fixed length text data types are surprisingly not any better.)

>> since the index cannot operate on the partial string
Index optimisation can be performed - as long as the string to be located was prefixing the value then the engine could use an index to match.
e.g. WHERE ConvolutedPKValue Like 'A*'
would be fine.

>> the concept that every table must have a primary key
I'd actually go along with that. By definition really. If you want a pratical reason - for updatability in Jet.
What that absolutely doesn't mean is that it must be a surrogate Autonumber. (Which is what you do then rightly discuss as is often pushed).
Do I use a surrogate AN/Identity for the most part? Oh hell ya. Does it make for a more trivial application? Pretty much. Is it a requirement? Nope!
Nothing wrong with the composite solution at all.
It can just make application operaterations a tad more fiddly (rather than a database perspective). But when designing a database you should always try to refrain from considering application influences as much as possible.
(It inevitably occurs - but we should try ;-)

I'd also agree that assuming an AN will increment isn't a good choice in a scenario with a large value gap.
They're also not guaranteed to be unique values (often mistaken cited as what they do provide in the absense of concurrent values) - it's only the PK's unique index which is so very often placed upon them which guarantees that.

The one solution not suggested (that I've seen) would be to fall back on a GUID.
But they too raise their own set of issues to work with. (Not least of which fetching the PK value(s) of a newly inserted row - which would mean you'd be better off assigning the GUID value in code rather than relying on NewID/GenGUID in the table definition - which you'd be doing anyway if you went with the custom autonumber option. (That said, with the composite key is there any reason not to rely on default functionality - the value is, as it should be, otherwise meaningless.

Cheers.
 
I have had a similar situation with gaps in auto number fields,(not enforced by me but by my client) however I suggested that this method was wrong as it would enevetably led to duplication issues.

Site 1 range start 10000
Site 2 range start 20000
Site 3 range start 30000

Host database accepts data from all sites and an add records in its own right

Host adds records 0 5000

Appends data from site 1 10000 - 11000

host database then adds 10 new records which would commence at 11001. As this is the next free autonumber available under the @@Identity rule.

Now site 1 then submits their next batch of records 11001 to 12000

Heres the problem. numbers 11001 through to 11010 have already been created by the host, therefore duplicate keys.

I also concur that composite keys are the best method make them both long integers.
 
I also concur that composite keys are the best method make them both long integers.

I have been a fan of the Byte datatype which will cover 256 locations. Even Integer would do for 65,000 or so without resorting to the four billion or so with the Long Integer

But I guess with 32 bit memory the Long Integer (2^32 values) uses no more RAM or processor time than the byte.

Is the Byte, or indeed the Integer, just a hangover from eight and sixteen bit systems?
 
Is the Byte, or indeed the Integer, just a hangover from eight and sixteen bit systems?

I guess so.

If you have time I would recommend the following lecture by Douglas Crockford. It is the first of a series on Javascript but this is a nice history of early computing and is completely independent of the title.

http://developer.yahoo.com/yui/theater/video.php?v=crockonjs-1

PS - not so worthy as other contributors but I would have an autonumber field and a further static (per location) field and then just join them together on record creation into the key field.
 

Users who are viewing this thread

Back
Top Bottom