Lookup Tables and Composite Keys

ButtonMoon · Jul 24, 2013

Galaxiom said:
You certainly won't get rich as a gambler.

That is an understatement. 2^128 = 3.4E38

If everyone on the Earth was each entering a billion records a year, for the entire history of the Universe there would still be less than one in three billion chance of any duplicate.

7E9 * 1E9 * 13.8E9 ~ 1E29

Thales may not get rich as a gambler but Galaxiom won't win any prizes at maths until he learns about the Birthday Problem

. Even assuming guid distribution is perfectly random (which it isn't) then the probability of a duplicate is greater than 50:50 after fewer than 1E20 guids. Still, for most purposes that risk is negligible.

For replication though a composite key or similar "intelligent" key is often superior to guids. Creating your own key generator gives you more control, is more economical and more "user-friendly" for an ongoing support and maintenance.

spikepl · Jul 24, 2013

@ButtonMoon

I have been following your theoretical utterings and to be honest I am disappointed. Recede back to your acadamic ivory tower, or get some common sense, and communicate on a level corresponding to that of the readers HERE!

The specific instance now - what kind of BS statement is: "the probability of a duplicate is greater than 50:50 after fewer than 1E20 guids." "Fewer" meaning 1? Or two ? or 5? or 1E19?

Remember that you are facing REAL LIFE (a.k.a. IRL ) here - no one cares about the theory, but everyone has some issue they need to solve, so as to be able to move on to some next task. Most have no clue about what they are doing and they don't care, so long as they make progress.

Galaxiom · Jul 25, 2013

ButtonMoon said:
Thales may not get rich as a gambler but Galaxiom won't win any prizes at maths until he learns about the Birthday Problem .

An elementary mistake. Of course I have calculated the probabliity of a particular number emerging rather than the repeat. Long ago I was actually quite good at maths but I have not studied statistics since 1978.

spikepl said:
The specific instance now - what kind of BS statement is: "the probability of a duplicate is greater than 50:50 after fewer than 1E20 guids." "Fewer" meaning 1? Or two ? or 5? or 1E19?

Now if I rememebr correctly after 35 years....

The probability of a repeated number on a particular instance is the number of potential duplicates (one less than the number of draws.) divided by the total number of possible outcomes.

Where t as the total number of possible outcomes and n is the number of draws.
(n-1)/t

The chance of getting a duplicate within n instances is calculated by finding the chance of not getting any duplicates.

The chance of not getting a duplicate on the nth instance is:
1 - (n-1)/t

The chance of not getting a duplicate on any instance up to n is the product of the probabilities of all instances to that point.

(1 - 1/t) * (1 - 2/t) * ...... * (1 - (n-1)/t)

Or is it this?:
((1 - 1/t) + (1 - 2/t) + ...... *+ (1 - (n-1)/t))/n

Edir: Perhaps someone with better maths than me could answer this last bit correctly.

Thales750 · Jul 25, 2013

Guys,
Can i enter a field name into a text box and get it to work in my routine?

Thales750 · Jul 25, 2013

It turns out that cutting and pasting field names into the code is not so bad. It's mindless work, something I'm qualified to do. LOL

Enjoy your days, folks.

Thales750 · Jul 31, 2013

Anyone know how to use variables values after vs! in DAO list?

RainLover · Jul 31, 2013

Too technical for me.

All I know is that Autonumber will not be duplicated so why are we talking composite.

Bear in mind that Autonumber is Indexed and does not allow duplicates.

Thales750 · Aug 1, 2013

RainLover said:
Too technical for me.

RainLover said:
All I know is that Autonumber will not be duplicated so why are we talking composite.

Bear in mind that Autonumber is Indexed and does not allow duplicates.

This whole discussion is about multiple locations that will be using the Web to sync all system data. It will use multiple different types of devises to connect, most of which will be some form of PHP, android, or I something.

But the main offices will be using some form of compiled local Windows apps (Access for the first round, since I know that). These local systems will need to operate even if the internet connection is down.

Hence the need for insuring data integrity. The scheme I have settled on combines an Autofield with a location field, converts them to text, adds some (0) to make it all the same length, and concatenates them into one primary key.

This method differs from Pats idea, in that this system will store the autonumber and the location data for every record. It's a small amount of overhead especially since these fields won’t be indexed and will only be used to reconstruct in the event of a corruption.

The reason we are not going to use composite keys is to utilize the efficiency of a single primary key. There is plenty of support for this schema, including extremely large system used by the Federal Government and multinational corporations.

The State Department's American Locations Systems that tracks the travelings of Americans around the globe uses a similar scheme. The idea is to insure data integrity even if all systems are off line.
Another aspect of our systems is no record can ever be deleted, or modified; ensuring that all changes to data are forward leaning, and data trails are automatically kept.
Now days with terabytes costing less than $100 it makes no sense to discard any data, just archive it.

nanscombe · Aug 1, 2013

Your concatenated text key "00010001000000000016" takes up 20 bytes per record / table and gives 9,999,999,999,999,999 in 9,999 locations.

My original suggestion of using 2 long fields (LocId / DeviceId / UserId and Id) takes 8 bytes per record / table but would only have given you +/- 2,147,483,647 (42,949,67,294) devices (or users) each capable of independently creating +/- 2,147,483,647 (4,294,967,294) records.

nanscombe said:
You could have created a composite key using a LocationID field (Long) and used the DMax() function, or a table with a bit of incrementing code as in Pat's idea, to give you the next ID (Long) for your current LocationID which would be linked to your current location (office).

As you say Terrabytes are cheap.

Mind you I did use something similar but on a smaller scale. I was allowing for up to +/- 213 Locations each creating 9,999,999 Records each per table. That way I could fit the whole thing into a single Long field - LLLRRRRRRR

The deleted marker would also mean that no record is ever deleted, just hidden (if required).

nanscombe said:
deleted (Yes/No) - to mark a record as no longer required. This avoids records magically reappearing from other databases.
dateStamp (Date/Time) - to mark when the record was last updated

Thales750 · Aug 2, 2013

It's actually stored as text. so in the event that we ever need more than what you see here; we could potentially add 26 to the 19th power by adding Alpha charactors.

But even using Alpha in the Location Field would bring the number up to a million and a half locations.

Lookup Tables and Composite Keys

ButtonMoon

Registered User.

spikepl

Eledittingent Beliped

Galaxiom

Super Moderator

Thales750

Formerly Jsanders

Thales750

Formerly Jsanders

Thales750

Formerly Jsanders

RainLover

VIP From a land downunder

Thales750

Formerly Jsanders

nanscombe

Registered User.

Thales750

Formerly Jsanders

Similar threads

Users who are viewing this thread