Advice on splitting table and best way to enter data?

jan@BRU · Sep 8, 2009

Hello,
I have the following situation: Consider a table with the following fields:

ID, A, B, C, X, Y, Z

ID is a unique (long) identifier (autonumber), all other are strings. The A, B, C fields are pretty repetitive and contain "long strings", while the X, Y, Z fields are variable, and contain shorter or numbers (consider that all fields have the right data type). The following should illustrate that:

1, Apple, Cube, Red, 998, 234, ABC
2, Apple, Cube, Blue, 233, 234, AR1
3, Apple, Cube, Red, 235, 234, 234
4, Orange, Cube, Yellow, 234, 208, B09
5, Orange, Cube, Red, 92, 540, B10
6, Apple, Cube, Red, 1258,585, YYR
7, Apple, Cube, Blue, 234, 987, ARS

The number of permutations for A, B, C are pretty limited, while X, Y, Z could have any numbers/strings in there. Since the Table has many records (approx. 1E6), the idea was to split the table into two:

Table1: IDX, A, B, C
plus
Table2: ID, IDX, X, Y, Z

The first has very few records (obviously), and the second still has 1E6 records, but doesn't repeat all the space-consuming A, B, C fields all the time.

So far so good.

Now, I still want to comfortably enter data into my tables. I have hence emulated the original table through a query:

SELECT Table2.ID, Table1.A, Table1.B, Table1.C, Table2.X, Table2.Y, Table2.Z
FROM Table1 INNER JOIN Table2 ON Table1.IDX = Table2.IDX;

Let's say, I now want to add a new record:

(Auto), Apple, Cube, Red, 123, 789, ABX

In this case, I would hope that my form (query? table?) would recognize that A=Apple, B=Cube, C=Red is already existing as an entry in Table 1, that it would select the correct IDX and only add a record to Table 2, to accomodate it with the new values for X, Y, Z.

What my query does: it creates a new record in both Table1 and Table2, creating a double entry for A, B, C, which is only differentiated by the autonumber IDX... That's not what I wanted.

However, if I wanted to add:

(Auto), Lemon, Circle, Blue, 435, 234, AZE

it should realiz that A=Lemon, B=Cricle, C=Blue doesn't exist as a combination for A, B, C in Table1 and add - indeed - a new record.

Any help?

Similarly, and probably a little more complex. If I find a record in my query, and I change the value of A from Apple to Banana, then it will of course do so for all records in the query, that have the same entry in Table1. That's not what it should do. It should create create a new entry in Table1 (or check whether a record in Table1 already exists), and then use the newly created (or found) IDX, to only change A from Apple to Banana for this particularly.

Is this making any sense?

Galaxiom · Sep 8, 2009

I think you are making it hard for yourself. By the time you add the extra key and maybe a link table to connect the two tables you will have more data than keeping it in one table.

For ABC type records make a table to tranaslate the fruits and shape strings to a an ID number (byte format if there are less the 255). In your main table fields use these IDs instead of the full string. This will reduce the ABC fields down to very little even though they are often repeated.

Incidentally 1E6 records is not a particularly large database.

neileg · Sep 8, 2009

For all practical purposes you can regard empty fields as consuming no space.
If you want to ensure that the combination of A,B,C is unique you can create a three field index set to no duplicates.

Atomic Shrimp · Sep 8, 2009

It's a rather unusual way of handling duplication of data, and I can't see the benefit.

If the motive is to save space, then you'd be much better off normalising the first table so that repeated instances of 'apple' or 'orange' are actually long integer foreign keys to a table of fruits, the repeated instances of 'cube' are FKs to a table of shapes, and so on.

gemma-the-husky · Sep 8, 2009

surely you should definitely split this into two tables

you have repeating groups in your table, which is a clear sign that the data is not properly normalised.

but how do you distinguish between the multiple exanples of apple/cub/red - is one of the other columns part of the key?

Advice on splitting table and best way to enter data?

jan@BRU

Registered User.

Galaxiom

Super Moderator

neileg

AWF VIP

Atomic Shrimp

Humanoid lifeform

gemma-the-husky

Super Moderator

Similar threads

Users who are viewing this thread