different types of entities in the same table

SunWuKung · Apr 20, 2005

Hi,
I am designing a database.
It is going to store data from questionnaires.
The natural way seems to be to have a table - that stores where the data comes from, what is it about, what is the question and what is the answer, like this:
DataProviderID; FocusID; QuestionID; AnswerValue

My problem is that a Focus could be a eg. a Person, a Company or a Job - meaning that somebody may be giving info on an other person, on a Company or on a Job.
Now Persons, Companies and Jobs would have very different properties.
So how should I solve this shold I have a common table for Peoples, Companies and Jobs and use a very complicated structure of properties for them or should I store these entities in separate tables but than how will I store the results?

Thanks for your thoughts.
SWK

The_Doc_Man · Apr 21, 2005

You need to segregate the structure of the table away from its data.

You bothered to provide ID codes for three of the fields, implying that they are possibly foreign keys linking back to the master tables that define the data provider, subject, and question. Only the answer WASN'T an ID, which means it is the only "raw" datum in the table. All else is derivative from other tables. (This is NOT a bad thing, by the way... it is just an observation and supposition.)

I don't think you need separate tables for your separate focus topics. On the other hand, this table when taken in isolation is not enough to tell us everything we need to know to fully decide. I'll make some suggestions of a more general nature.

By implication, your questions have focus on a subject. There is nothing wrong with the idea of storing all raw answers in the single table that has multiple keys of providerID, subjID, and quesID. Three-field compound as primary key is perfectly reasonable for what you have shown. Then, when you want to focus on a subject, write a QUERY that filters out things that are OFF the subject. What is left is your focal subject. You can write differently structured queries for each subject if you wish. And can base reports on those different queries.

If the way that you derive the providerID, subjID, and quesID differs for each focus but the answer table still only depends on those three fields, you have not necessarily violated any normalization rules. (At least, not the first 3. Not so sure about rules #4 and #5, which SOMETIMES become an issue.) And it is not required to store a separate focusID with the answer if it can be derived or looked up knowing the subjID or the subjID and quesID taken together. In other words, there is no OBVIOUS structural defect here. HOWEVER, that does not mean that it isn't confusing. (Sorry for the double-negative.

)

This MIGHT be a time for my "do it on paper first" approach. Get a dry-erase board, some appropriate markers, and some sticky notes.

Draw table names on your board. Write some sample records for each table on the sticky notes and post them. Now try to build reports, forms, queries, etc. by seeing the relationships visually laid out before you. This will tell you where your design requires more work than might otherwise be normal. When you are satisfied with the layout and your understanding of the data schema, reclaim the board, markers, and remaining notepads for use in other projects.

SunWuKung · Apr 21, 2005

I think I know now what my problem was.
I couldn't give a proper name for a table that contains ID for so different things as persons, jobs or companies - so I couldn't really imagine that table.
Now Subject is good.
And you helped me a lot.
Although I couldn't recite any of the 5 rules. I thought there are only 3 of them - or are they the rules of robotics?
(Just kidding - Are they things like if your database structure is good than your table designs will not allow null values, and thou shall never store calculated values in your tables? Like any good rules they are difficult to live by.)
I will also try the sticky notes thing.
Thanks again.
SWK

The_Doc_Man · Apr 21, 2005

As to the 5 rules (vice the more common 3 rules) ... rules 4 & 5 rarely become significant even with the "big boys" of database management, but every now and then they crop up.

1. Eliminate repeating groups and arrays. All fields become scalars. (e.g. if you have fields Score1, Score2, Score3, etc.... move them ALL to a Score table, one score per record. Key in the score table becomes 1,2,3,etc. compounded with key from original table to link back to origin.) Savings come from reduced space if some records had fewer scores associated with them than other records.

2. Eliminate redundant data. (e.g. if you frequently repeat customer name, address, phone..., make a customer table & make it have customer ID key, then replace ALL redundant fields with single foreign key that leads to the single copy of the unique info.) Savings come from reducing the number of places that store the same identical data.

3. Eliminate columns not dependent on key. (e.g. if you have a list of possible countries of origin in a personnel table, don't store the actual country name and info about the country. Store a key to a separate lookup table. Then it doesn't matter if you don't have someone around from Kuala Lumpur this week, the data is in a separate table that persists despite not being referenced in the other table.) Prevents problems of inserting or deleting valid data that has no other place to live than inside some OTHER record for which the key has no direct relationship.

4. Isolate independent multiple relationships. (e.g. if you have an attributes field but that table that feeds it stores multiple, independent attributes, you can only choose one attribute. Remember, fields have to be single-valued because of 1st normal form issues. So never make one attribute "left-handed" and another one "blue-eyed." Both could be true but you can only store one. Make TWO attribute fields for different CATEGORIES of attribute - like Eye Color and Handedness as TWO distinct fields - and SPLIT the original attribute table according to these categories.)

5. Isolate semantically related multiple relationships. (e.g. when a constraint exists in a contributor table that is 4NF, sometimes it is better to split the elements of the contributor table into two parts in order to better represent the constraint.) This one is VERY tough to picture and I don't have a good example.

Another way of saying these is this. A table is in:

1NF: if it has (1) no duplicate rows; (2) each cell is single-valued, and (3) all entries in any given column always have the same meaning (not necessarily same value).

2NF: if a 1NF table AND all non-key attributes depend on the entire key

3NF: if a 2NF table AND it has no transitive dependencies

4NF: if a 3NF table AND it has no multi-valued dependencies

5NF: (a.k.a. Projection-Join NF) if a 4NF table AND every JOIN dependency is a consequence of the candidate keys.

Other normalizations: Boyce-Codd NF: if a 3NF table AND every determinant is a candidate key

Domain-Key NF: if every constraint on the table is a logical consequence of the definition of the keys and domains.

You can Google "Database Normalization" and pick up merely a gazillion hits on that topic. Pick any articles that seem to come from a reputable university (domain .edu), they tend to be the most scholarly. Download and read in your copious leisure time.

different types of entities in the same table

SunWuKung

Registered User.

The_Doc_Man

Immoderate Moderator

SunWuKung

Registered User.

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread