structure of a table

ducker

across the pond
Local time
Today, 16:07
Joined
Jan 6, 2005
Messages
55
I was having a discussion with a friend of mine about this... and I couldn't come up with the best explination. Hopefuly someone here can point me in the right direction.

He has a table

Recipies(Id, name, item1, quantity1, item2, quantity2, item3, quantity3, item4, quantity4)

I don't believe that this is the proper way to create a table. What happens if you happen to come across a Recipie that needs 6 items?? or more??
After researching it, I believe it violates 2NF. Is this correct?

Instead I think a better structure would be

Recipies(Id, name)

RecipiesIngredients(Recipies.ID, Ingredients.ID, quantity)

Ingredients (Id, Name)

The problem my friends sees with this method is the fact that the Xref table will get giant.

In the example... Speed is of utmost importance, as the tables will most likely contain thousands of records and be queried against frequently.

Thanks for any input.

-Mike
 
ducker said:
Instead I think a better structure would be

Recipies(Id, name)

RecipiesIngredients(Recipies.ID, Ingredients.ID, quantity)

Ingredients (Id, Name)

You are correct but you shouldn't use ID as the name for your primary keys. Use the entity type and ID. So it should be:

tblRecipes
RecipeID
RecipeName

tblIngredients
IngredientID
Ingredient

tblRecipesToIngredients
RecipeID
IngredientID
Quantity


Don't use Name as a field name because it is a reserved word in Access.

Speed is of utmost importance, as the tables will most likely contain thousands of records and be queried against frequently.

The above will work as the friend's suggestion is too problematic.
 
I realize about the names of tables, I just threw the example together.

SJ McAbney said:
The above will work as the friend's suggestion is too problematic.

What aspect of this is too problematic? He acknowledges the fact that he might have to add fields to the table potentially in the future, but he is concerned about a HUGE crossreference table.

Might you have any links to any additional information information stressing the importance of breaking a table down like this as I don't have much to stand on if he has no problem adding fields to a table if he so needs the fields, in order to get faster performance.
 
This issue is not up for debate. He is unlikely to get faster performance with a “flat” table and is only making a maintenance nightmare for himself. Repeating groups violate third normal form. Violations of the first three normal forms result in data anomalies, extra code, and more complex queries. Your friend doesn't know what "huge" is apparently and also doesn't understand how indexes work. "Huge" is millions of rows and Access actually handles millions of rows quite nicely. Finding a specific record by key or Index in a million row table takes only 20 reads at most. The average would be half that! A "flat" table frequently requires more storage space than a normalized table since you need to keep "extra" columns just in case some recipe needs more than 6 ingredients. How many is enough? 10 ingredients, 15 ingredients? - you really want to accommodate any recipe but to do so, you will waste large amounts of space since each ingredient requires at least two columns. If horror of horrors, you elected to go with 8 ingredients and then wanted to enter a recipe with 9, and were too lazy or didn't have time "right now" to add the additional column and so added a second ingredient to one of your columns - well that would break ALL your searches and also violate first normal form. This is the real danger of this type of structure. It probably isn't going to be convenient to add those extra columns when the recipe that requires them gets added. Remember, it is not just adding a column to the table. It is also changing your forms/reports/queries/macros/code!!!

Don't make the mistake of confusing a relational database with a spreadsheet. With a spreadsheet, the data and presentation layer are the same. With a relational database, the data is stored in tables. The forms/reports, etc represent the presentation layer. Anytime you change the structure of a table by adding columns to it, you MUST change the presentation layer or you won't "see" the new columns.

Searching your recipe table to find anything made with chocolate would involve searching 6 columns. If you added a seventh, you would need to remember to modify your search query as well as your forms and reports. Calculating weight or calories, etc. would require more complex queries since the calculations would need to include 6 columns including some that may be null. These queries would also need to change if you add additional columns. None of this is a problem with a properly structured table. You just Sum() a column of data.

The "relation table" which handles the cross reference should contain a unique primary key which is comprised of both the RecipeID and IngredientID and an alternate index on IngredientID to facilitate searches. The PK will handle searches by RecipeID and the index will handle searches by IngredientID. When you select recipe # 49 and join to the relation table and then to the Ingredients table, Jet uses indexes to find the appropriate records so very few rows are actually read. If you didn't use primary keys or indexes, then certainly EVERY row in the relation table would need to be examined but that's not how it is done. That is why properly defined table schemas include appropriate primary keys and indexes.

I would suggest a different relation table. Many recipes call for ingredients to be added in a particular order and even occasionally the same ingredient is split into two additions. Given that, I would add a third field to the relation table to enforce sequence. You might also consider a comment field to add more detail specific to this ingredient. Temperature for example or mixing time.

tblRecipesToIngredients
RecipeID
AdditionSequence
IngredientID
Quantity
 
I'm not sure if this plays in to effect here, but does it matter that this example isn't to be done in access, but in mysql, accessing it through a php script?

The php script will essentially be building the form (part of the html page) as it runs. So in effect the amount of modifications one must do to the form aren't quite as extensive as if you were utalizing a front end such as MS Access.

Thanks for the great input thus far.

-Mike
 
no, it doesn't matter. :)
 
Pat's post is absolutely correct. (As usual...)

When you debate your friend about normalization, consider this as well as what Pat told you.

The normalization rules have evolved precisely because everybody USED to make the mistakes that these rules now allow us to evolve. Someone went through the trouble to derive PROOFS of how each of these normalization rules help in a specific, provable way.

Proofs along these lines generally involve simple concepts: Computability (finiteness of algorithm) and Space minimization (compactness of storage). When you can guarantee normalized tables, you can make mathematically precise statements about those tables to the point of generalization. You can demonstrate that, with a finite normalized table, a query CAN find all answers to a given "question." You can show that a normalized table structure is always equal to or smaller than the equivalent unnormalized version of the table. (The "or smaller" is USUALLY the case; it is quite rare to normalize and see absolutely no space improvements anywhere.)

If you are more technically minded and really wanted to follow up on the topic, you can try web searches on "Interrogatory Logic" - which is the question-oriented complement of "Assertory Logic." The latter topic is the subset of logic that includes but is not limited to the more traditional syllogisms that you took in your introductory logic courses. De Morgan's Theorem. Logical Implication. That sort of stuff.

Without this Interrogatory Logic, no one could prove that a query is even possible because there is no formal method of defining the result-set of a query without it. I.e. does the answer produced by the query actually answer the question that was asked? And the normalization "rules" have been derived as ways to reduce tables to the point that such proofs and determinations become possible.

Stated another way, the normalization rules are a short-hand form of DB experience. You know what experience is, don't you? With apologies to Ambrose Bierce, "Experience" is that attribute which allows you to recognize a mistake when you make it again.

;)
 
After continueing my discussion with my friend it seems as though I have slightly misrepresented the situation.

In the case... there is never more then 4 ingredients. In addition those 4 columns of the table are indexed and stored in to memory, thus giving him faster access to the record matches.

I've been developing in access for quite some time, and feel I have a fairly good handle on how to create them and organize them. Yet in this case I'm still confused.

Again, this data isn't in an access database but on a webserver, contain thousands of records and will be queried against million of times a day. Having all of those columns indexed provides him with the added speed of the results. Users are not inputing any data, simply quering against it and displaying it.

my skills in php and mysql data manipulation aren't up to par yet so I can't reconfigure his data to try it the normalized way; but does any of this additional information change the base question? Are there circumstances where 3NF might not be the best route to go?
 
Again, this data isn't in an access database but on a webserver
Access (actually Jet) has nothing to do with anything we have said here. The "rules" of normalization were derived and proved at least 20 years before the first version of Access was published. The original writings go back to the late 60's and early 70's and nothing has supplanted them. Perhaps, there will be some massive theoretical breakthrough that brings forth some new "miracle" database to replace the relational database but until that happens, your friend is just plain wrong. When you have more than 1 of something, you have many. It doesn't matter whether you have 2 or 4 or a million, a relational database dictates a separate table to hold the many-side tuples.

Remember I told you that you could find a single record out of a million with a maximum of 20 reads and an average of 10. (Look up Binary Search to understand the concept of how indexes are searched) Well, assuming that instead of 1 million rows in a normalized table, you have 250,000 rows with 4 IDs in each row representing the same data. Your friend thinks that that table will be faster to search because it has so many fewer rows. Well, he's wrong. The reduction is only to a maximum of 18 reads and an average of 9. BUT, and here's the kicker, you now have FOUR indexes of 250,000 rows to search so, the search is actually now a maximum of 72 reads and an average of 36!!!!!!! to find any particular record.
Are there circumstances where 3NF might not be the best route to go?
- Perhaps in a datawarehouse, but NOT if you ever have to search on the repeating group.
my skills in php and mysql data manipulation aren't up to par yet so I can't reconfigure his data
Reconfiguring the data is easy. It just takes a union query. Import the table into Access. Then use a union query to take the 4 sets of columns and turn them into one. Then create a make table query based on the union query to make the normalized table. Create the correct indexes on both tables and time your queries. If the table is large enough, you will be able to measure the difference.
 

Users who are viewing this thread

Back
Top Bottom