database normalization - question on practices of this

Maximus Primal · Aug 17, 2010

I have read a couple of website that explain now to normalize a database structure into the "perfect" model which reduces redunacy and duplication.

They all state that for a database to be right you have to take it to the third level of normlization (3ND?).

Question is, what if you cannot get a database to the third level, say you can only get it to the second level - does that mean your model is flawed and wrong or just that is is not possible to achieve a "perfect" model for it, and does it really matter if you cannot find a way to get it to this third level?

I have my own model for a magazine collection database at the second level currently, and being the first one I have designed I am pleased I got it this far. But I cannot see a way of achieving the requirements of the next level and as yet no one on this board has suggested a possible way or idea on how I can do this - so I am assuming it cannot be done.

So with this mind, if my database is only at the second level am I likely to have any possible issues or would it be safe to go ahead at this point and start the database creation?

I would like some thoughts as I am a non-IT person, in a non-IT job trying to understand concepts and ideas which (at the moment) confuse me and I have no direct personal resources other than the Internet to do this.

Advice and thoughts would be appreicated so I can move forward and not consign this to the "good idea, shame I had no idea how to do it" bin.

Thank you in advance
Max

namliam · Aug 17, 2010

Maximus Primal said:
Question is, what if you cannot get a database to the third level, say you can only get it to the second level - does that mean your model is flawed and wrong or just that is is not possible to achieve a "perfect" model for it, and does it really matter if you cannot find a way to get it to this third level?

It is possible though unlikely that the second normal form == third. I do not remember the details of the steps of normalization, it comes like second nature now, but each step is there for a reason.

Maximus Primal said:
I have my own model for a magazine collection database at the second level currently, ...

OK, you have a base design... so show it to us, perhaps we can critique it.
The most important thing of desinging your first database is, do not be afraid to start, fail and restart from scratch and rince and repeat that 2 or 3 times. It happens to the most experienced developers though it happens less and less as you gain more and more experience.

The most important step is (and you seem to have done this), to leave behind the old world of what you knew (i.e. Excel). Abandon it completely, Access and databases in general are a new beast that you must learn from scratch, that includes normalization.

Maximus Primal said:
So with this mind, if my database is only at the second level am I likely to have any possible issues or would it be safe to go ahead at this point and start the database creation?

Yes if it truely is in 2nf you will run into problems, again, do not be afraid to fail/fall.
If you didnt fail you would still be crawling now instead of walking/running.

Most important in addition to a base design is to work with a naming convention.
That is
Do not use Spaces or special characters in any column/form/query/table that is inside the database. If you need them on the forms for users thats OK, inside the database big no no... ie. not Magazine Number, but MagazineNumber

Do differentiate between objects in the database.
Do not create a table Magazine, then a query Magazine, then a form Magazine
Instead use tblMagazine, qryMagazine, frmMagazine

Do use sensible names EVERYWHERE at all times.
That includes tables/columns/forms/ even buttons and such.
i.e. IssueDate, would contain a real date and the date of issue. Not the number of the issue or the date it was added into the collection.

A lot of this seems logical, but you would be surprised.

Maximus Primal said:
Advice and thoughts would be appreicated so I can move forward and not consign this to the "good idea, shame I had no idea how to do it" bin.

Thank you in advance
Max

Well max, I hope all this helps even just a small bit. I wish you good luck on your project and hope to see you around with more mind boggling questions for us to answer.

Maximus Primal · Aug 17, 2010

namliam said:
OK, you have a base design... so show it to us, perhaps we can critique it.

The design can be seen in my post : http://www.access-programmers.co.uk/forums/showthread.php?t=197572

But to make life easier I have copied it below:

Code:

2nd Normalization (my attempt)

Magazine Table
• Magazine Number  Primary Key
• Magazine Title
• Country ID
• Issue Number (if any)
• Publication Date
• Publication Year
• MagType ID

Magazine Type Table
• Magtype ID
• Magazine Types / Categories

Country Table
• Country ID
• Country of Publication

Format Table
• Format ID
• Format Type (Magazine / Digital / Both)

Publisher Table
• Publisher ID
• Publisher Name

Condition Table
• Condition ID
• Condition (20 options, 10-0 in .5 steps)

Models Table
• Cover Models ID
• Name of Cover Model

Magazine Details Table
• Magazine Number
• Cover Price
• Pages
• Barcode Number
• Publisher ID
• Cover Model ID
• Cover Scan
• Format ID
• Condition ID
• Condition Notes
• Obtained From
• Obtained On
• Obtained Price
• Keywords for referencing articles etc

I know the tables are not named correctly, these are just my wordings for what they are, I will condense/choose sensible names when I come to try and make the database.

Some things to bear in mind:

I catalog magazines by Magazine Title/Country then the issue in question. So each magazine can have multiple countries and multiple issues for a given country if that makes sense.

The publisher is specific to the ISSUE, not the magazine or country (although a magazines publisher will vary depending on the county if the title is published under license as Vogue, Elle etc are around the world). A magazine title can also change publishers if one sells it to another, so each issue must have a publisher which relates purely to that issue.

MagType is the type of magazine and needs to have multiple values (at least 5) to help categorise the magazine (eg: Fashion, health, cars, IT etc).

Cover model/Stars also needs a multiple value.

When you look at an issue, you have 2 distinct areas. Items which relate to ALL copies of that issue (pages, cover price etc) and specific items which relate to the item in hand (condition for example).

This is what I came up with and as best I understand, it falls into the 2nf category but I cannot see how to make it into the 3nf category. If it is a case of suck it and see then I will take a stab and see what I get. But understanding the underlying principles would really help me out so I can at least spot the signs of problems before I get in too deep and realise I have wasted x weeks work.

I will look at the other points you mentioned after I get some food for my brain.

Max

namliam · Aug 17, 2010

OK, Magazine number, Number 1 of playboy, Number 1 of Playgirl, Number 1 of Girls and Bikes, number 1 of .... does that make sence? Though I guess that might be more an 'issue' rather than number... If your making it a autonumber then thats ok, but otherwize it could be a problem since keys must be unique.

• Publication Date
• Publication Year
Year is part of the date, Year is considered a calculated value from date, remove it

Magazine Table
Magazine Details Table
I think these two tables hold the same type of data and maybe can be merged.

ALL copies of that issue (pages, cover price etc) and specific items which relate to the item in hand (condition for example).
Seems like to me the # of pages, the price etc, can all vary just as much as the condition...

underlying principles would really help me out
When you have a good feel for this type of thing, there is no real distinction between the 1nf and 2nf, there is just 0nf - what data do I need ultimately - and 3nf -how do I store my source data -
Then as you gain more experience and start working bigger you can worry about 4nf and bcnf (or 5nf).

I think (bar the year) your pretty good to go. Though keep in mind a database is like a child and can grow quickly... No matter how much prep you do, you always have a chance of going -one month from now- "oh well, nice try for a nice piece of crap".... Now lets start over and do it right.

Final thought, its not very common to see tables of ID + 1 field.. Ie.
Country Table
• Country ID
• Country of Publication
Perhaps add stuff like language, or perhaps thats a seperate table again?
Or ISO code/Country code
ie. Netherlands, NL, 31

or
Publisher Table
• Publisher ID
• Publisher Name
Perhaps location, address, phone number, Date of founding, Date of bankrupt,etc...

As a last note, Keywords ... there can be 1 key word -Cars- or -Cars Chicks Bikes Dudes 500cc 250cc Lowrider-
The point is, it can be 1, it can be limitless. Keywords "generaly" do go into a related table. That will also allow you much easier access to "how many magazines do I have that have a keyword 500cc"

The_Doc_Man · Aug 17, 2010

Based on what you described, you are still in the "thinking" stage for a lot of this project. That is good. Think before you write. ALWAYS a good rule.

In general, you need to normalize (as you say) to minimize space and eliminate redundancy where possible. But also it becomes easier to make fast adjustments to reports and such later, because Access is sensitive to defined relationships. I.e. if you build a set of tables and define their relationships, then when you have to build a form that references a table (say, for a value lookup), that relationship is already known to Access. It won't have to ask you where your lookup value originates and won't have to ask you other things about the lookup either.

When your tables are not normalized, you can still build relationships but they might not work like you expected them. Therefore, normalization ALSO helps you to help yourself down the road.

Let me toss a couple of observations to show you how this works in practical terms.

Look at your Mag Details table. You have three "Obtained From" fields related to where, when, and from whom something was obtained. Does it ever occur that you obtain more than one magazine from the same person, perhaps a dealer or a friend with whom you share this hobby? When you can reference that person more than one time, maybe you want to split the "Obtained from" field to show details about the person from whom you got that magazine. (If you don't worry about anything except the name of that person, this doesn't apply.) Even if you don't do this, the concept is instructive because of something it reveals about normalization.

Let's say that you have lots of people from whom you obtain various magazines. You want to keep their names, phone numbers, addresses, and some other information about them. Where do you keep that?

Answer: If they are worth keeping, they are worth keeping separately. You would create a table of contacts (for lack of a better name). In that table you would put things like phone, address, name, comments, etc. Then just store a pointer from the magazine table to the contact table. Follow-up question: Why would you not just put the person's name and address in the magazine details table? Here is the important concept: The key of the magazine detail table has nothing to do with the person from whom you obtained the magazine. That is another way of saying one of the big rules in normalization. NOTHING goes into a table unless it ONLY depends on the prime key of the table. If you kept a phone number in the magazine table AND you had good odds that you would get more than one magazine from this person, the phone number in the details table would not depend on the magazine's assigned key but on the person's name, which is a NON-KEY field in the same table. This is one of the tests for having a denormalized database. On the other hand, the Obtained ON field doesn't depend on the person but on the specific magazine. Therefore, this field DOES NOT get split over to a secondary table detailing your source's personal information. See the difference?

Now as to another question you asked/implied: There is no way to guarantee that any database will have the features necessary to normalize it to a given form. The normalization rules REALLY tell you that if you have condition X, then you should split tables in such a way as to retain that data separately and eliminate condition X. But if your database doesn't have condition X (whatever it is), that particular normalization might not apply (or equally, you might already be there and just didn't realize it).

Normalization goes up to (at least) 5th Normal Form, plus there are some very special cases regarding named normalizations - Boyce-Codd (or is it Bryce-Codd... I can never remember) comes to mind fleetingly. But not a one of them applies if the situation they address doesn't occur in what you've got.

Therefore, don't obsess over getting to a specific normalization. Look at the things those normalization rules are meant to eliminate. Do THAT - and you'll be fine. In the final analysis, a number is just a number. But clean data is much easier to use if it is organized correctly.

I've rambled on a bit, but your question included the implication of questions on best practices. So I thought I would toss in some ideas for you. Good luck with your project.

Maximus Primal · Aug 17, 2010

namliam said:
ALL copies of that issue (pages, cover price etc) and specific items which relate to the item in hand (condition for example).
Seems like to me the # of pages, the price etc, can all vary just as much as the condition...

They can from an issue point of view, but my distiction was that EVERY copy of say the July 2010 issue of Vogue in the UK will have the same cover price, the same amount of pages etc.

But my copy may have a tear in the cover, where as yours may be undamaged.

You are right in that issue to issue these items can change, hence they are specific to a given issue.

Year is part of the date, Year is considered a calculated value from date, remove it

Okay I understand this, the reason I separated this and (sadly it probably a throwback to using Excel before) was that I sometimes want to see all issues for a given year and thought it best to have a separate entry for it. But I am guessing with the correct search query I would be able to do this and have Access only look at the year and not the whole date, am I right?

I think (bar the year) your pretty good to go. Though keep in mind a database is like a child and can grow quickly... No matter how much prep you do, you always have a chance of going -one month from now- "oh well, nice try for a nice piece of crap".... Now lets start over and do it right.

Thank you, that is sound advice. I understand why you say it as well and will bear this in mind as I attempt this. That sadly it may be a case of suck it and see and if I am wrong, so be it.

It seems that I have made a good and sensible first step which is good to know. This is a very daunting thing and trying to grasp concepts, ideas with no formal training can be hard. First steps make the next ones a little easier and more bearable when the mistakes (inevitably) come later on.

Thanks for the advice Naliam.

Max

Maximus Primal · Aug 17, 2010

The_Doc_Man:

Thank you rambling on. That actually is EXACTLY the sort of stuff I need to know and understand. Reading on the Internet tells you what to do and how to do it, but not always when (and just as importantly) when NOT to apply the rules.

I see that I still have a few things to tinker with, which is fine. This is my first attempt at grasping it and I knew it would not be perfect. I see things thanks to you and Naliam that I did not which again is perfect. I lack the ability to speak to someone in real life who understands Access and can give me 5-10 mins of sensible and helpful guidance. You both have done just that.

My questions/comments where worded not only to try and envoke an answer but also to get the reason for the answer. An answer is after all no good if you do not understand the original question and method to gain the answer.

I will spend another day or two and look at this, tinker - see what could be expanded and what cannot. I can and will use the information you both provided to try and adapt the table structure and understand the way the tables will interlink now.

However one thought does occur to me and perhaps one of you (or someone else) can clarify it for me.

A given record is located by it's primary key as best I understand. This is in effect the record number, it's location (like an address) of where the record and from it any inter-related data can be found.

In my example, would I be best in having the magazine title with the primary key (via the auto number facility)and then having this number fed to things like the country and issue so they all share the same primary key and relate to that exact magazine or do I assign the key a different way?

I think this would allow for multiple magazines in multiple countries as each issue effectively is give it's own unique identifying number.

Did I understand that right?

Thank you again so much for the excellent and valuable help so far.

Maximus Primal · Aug 17, 2010

Sorry you have both got me thinking, and I have another question. Can I relate a table to two separate fields whereby the result of both derives an answer.

My thought was to have a list of the magazine titles and their launch dates, and where applicable the date of the last issue. But this is a country specific thing, as (for example) CosmoGirl in the USA launched before the UK edition and continued long after ours ceased publication.

This is good information to know when looking for first or last issues which are often the hardest to obtain, but as I said it is country specific, so I need only to see information relating to X magazine in Z country. I can create a table with the information in it, but would I have to have two primary keys (can I even do this?), where the magazine title and country ID go to create the entry id or is this simply something a search query deals with?

Does that make any sense?

Max

Steve R. · Aug 17, 2010

Maximus Primal said:
I can create a table with the information in it, but would I have to have two primary keys (can I even do this?), where the magazine title and country ID go to create the entry id or is this simply something a search query deals with?

Each table has a primary key. In the case of the "Country" table this would be the "Country ID". However, in your main table - "Magazine Table" - you would have a long integer that is identical to the value of the primary key in the "Country" table.

Also look at your "Country Table" and "Magazine Table". Both tables have a field "Country ID". This can become very confusing when trying to code. I would suggest changing the name of the "Country ID" field in the main table to something like "MTCountryID". Keep all field names unique. Also be careful of using any field names that could be the same as an Access reserved word.

dportas · Aug 17, 2010

The_Doc_Man said:
Now as to another question you asked/implied: There is no way to guarantee that any database will have the features necessary to normalize it to a given form. The normalization rules REALLY tell you that if you have condition X, then you should split tables in such a way as to retain that data separately and eliminate condition X. But if your database doesn't have condition X (whatever it is), that particular normalization might not apply (or equally, you might already be there and just didn't realize it).

Normalization goes up to (at least) 5th Normal Form, plus there are some very special cases regarding named normalizations - Boyce-Codd (or is it Bryce-Codd... I can never remember) comes to mind fleetingly. But not a one of them applies if the situation they address doesn't occur in what you've got.

I think I can guess what you are trying to say but what you actually said is a bit misleading. Any schema with any given set of dependencies can be normalized into Boyce Codd, 5th or any other Normal Form. It's true that in some cases you may have to lose some dependencies from the final model but there's no reason why BCNF or 5NF cannot or should not apply to any database design.

Unfortunately normalization is too often thought to be synonymous with normalization by decomposition. The decomposition approach can give the misleading impression that BCNF and 5NF somehow "do not apply" to some cases (for example if a schema has only simple candidate keys) and this seems to be the root cause of the confusion behind this question. Decomposition is also not a very practical approach outside an academic setting.

BCNF / 5NF are in most cases more important and useful than 3NF and BCNF is generally easier to understand, apply and verify than 3NF. My general recommendation would be that you try to start with a schema in BCNF or 5NF, which is always achievable in principle. Then possibly denormalize in the special cases where BCNF is not dependency preserving (3NF is always dependency preserving whereas in a few cases BCNF and higher NFs are not).

Maximus Primal · Aug 17, 2010

I have tried to design a Data Model based on ones I can find at http://www.databaseanswers.org/data_models/

My idea was to get the tables and try and understand what where connections between tables will need to be made.

I have attached my effort below along with a copy of the model I used for reference from DatabaseAnswers.

I am a little concerned with my linking the main entry primary key to 5 other place to give me many-to-many options (publishers for example, they may publish 20-30 magazines, but not always the same magazines and I have done this as best I can currently understand. Again I am trying to get the priciples right before I jump in and sink badly.

Thought as ever are appreciatedD, good bad and otherwise.

DPortas:

When I read your comment my mind said "That's good, what does it mean?". This is a sign that I am tired (past 11pm here) and I need to sleep. I look at your comment in the morning and reply to it correctly then. I will also look up 4nf, 5nf and Bcnf criteria tomorrow and try and understand what you are getting at.

Max

The_Doc_Man · Aug 17, 2010

DPortas, my point was that some models are more complex than others. Sometimes it just happens that the issue addressed by 4NF doesn't apply to a given database because that kind of oddball dependency doesn't occur. If that sounded more complex than it needed to be, it was perhaps because I decided to ramble on about the process because that, too, was of interest to Maximus Primal.

Maximus, you asked a couple of questions for which I need to just point you elsewhere so you can do reading. In this forum, you can do a search of topics.

Regarding your question

I think this would allow for multiple magazines in multiple countries as each issue effectively is give it's own unique identifying number.

I recommend you search for the topic "Synthetic Keys" and "Surrogate Keys." In the Theory section, we have beat that topic half to death. Choosing proper keys is not quite yet a science, but we know some good guidelines. I'm thinking that if you could have multiple copies of the same magazine from the same country but in different conditions, you would instantly need a surrogate key because anything else would start to become excessively long. Take that as opinion, not necessarily a fact cast in stone.

two primary keys (can I even do this?),

No. One PK per table. (That's why it is called "Primary".)

The larger question surrounding that one is answered best by saying that you can relate tables with many relationships that ease your queries - but sometimes an ordinary query is efficient enough.

Too many relationships can also be an issue because of a side effect thereof. You must have a PK on ever table that is ever the "One" side of a One/Many or Many/One relationship. That is expected. But lots of people have other indexes (No Duplicates in many cases) as part of a relationship where they really didn't need such a thing.

Indexes cost you time and space because (for large databases) you must update each participating index on any affected table update, insert, or delete. For smaller databases, this is not any huge issue, but you said you wanted to hear discussion on why something is or isn't done.

There is a balance to be found between fewer or greater numbers of indexes on tables. In general, I look for how often I would perform a particular search involving a particular field. Then, if it is in the top 10% to 25% of my most likely searches across the entire database, that field - in whatever table I use it for searching - is a good candidate for indexing. If less than one search in 1000 would use that index, why bother? There are limits on indexes, and you can look up "Specifications" to see the limit on your version of Access. You never want to exceed that number, but in truth you really don't even want to get close, because of the overhead associated with indexes.

Of course, if you have a few thousand magazines, it is easier to keep up indexes than if you have a hundred thousand, or a million magazines. It's all a matter of scale.

When you read the comments made by DPortas, you should also consider something else. Each of us approaches database design differently. Good methods exist but no single method is guaranteed to be right or wrong in the abstract. Specific methods might be wrong for specific models. But in general, as we say here in the USA, there are many ways to skin a cat. All of them require that you start with a firm grasp of the cat. And so it is with databases. You require a firm grasp of your problem before it is really safe to start. Hence my earlier comments about thinking before writing.

Maximus Primal · Aug 18, 2010

The_Doc_Man said:
I decided to ramble on about the process because that, too, was of interest to Maximus Primal.

I am very grateful for these "ramblings" as they help me understand the background behind how to do things and what I need to be aware of. As I said before, I have no access to a real person who can sit down beside me and explain things to me. Websites while great at giving information appear to have an assumption you know what they are talking about and you just need clarification of it - which is not always the case.

The replies so far have helped me gain some understanding and show where I was still holding misconceptions or incorrect ideas partially based on what I knew before (Excel and Purebasic), and where I still needed to adapt to - for want of a better phrase - The Access Way.

I recommend you search for the topic "Synthetic Keys" and "Surrogate Keys." In the Theory section, we have beat that topic half to death. Choosing proper keys is not quite yet a science, but we know some good guidelines. I'm thinking that if you could have multiple copies of the same magazine from the same country but in different conditions, you would instantly need a surrogate key because anything else would start to become excessively long. Take that as opinion, not necessarily a fact cast in stone.

That is actually a good point and you would also be correct. I have multiple copies of some first issues. Some in perfect condition, others not quite so. I assumed that as each record would get a unique indenting number (via the autonumber facility) that would be enough. I will look up surrogate keys when I come home from work tonight and see what I can gain from that and how it may be of use to me.

I have also come to learn that everything regarding database design is more opinion than fact. And that not everything will apply to every database as you have pointed out. Again a misconception I had, that certain rules must be followed, when it is more a case of they should be applied where appropriate and where possible. Something I was not able to grasp from the websites I have looked at previously.

No. One PK per table. (That's why it is called "Primary".)

Thank you, again, I assumed primary was just referring to the main way of looking something up and you could look from two directions as it where.

There are limits on indexes, and you can look up "Specifications" to see the limit on your version of Access. You never want to exceed that number, but in truth you really don't even want to get close, because of the overhead associated with indexes.

According to Access 2007, I have a limit of 32 indexes per table and 10 fields in an index.

Presumably this means if I had a table with 50 items, only 32 of them can be used for indexing of the data and any given index (say country lookup) can only relate to 10 given fields across the whole database.

In reality you are saying however, indexes should be minimal - one or two ideally per table to keep both memory overheads and complexity to a minimum. I.e do not waste resources and time on something which will be used once in a blue moon, concentrate on stuff which will be used at least 75% of the time or more frequently. These are the items to index as they are the important parts of the data you are looking to store and the bits you will be looking for later on when searching for stuff.

Of course, if you have a few thousand magazines, it is easier to keep up indexes than if you have a hundred thousand, or a million magazines. It's all a matter of scale.

True but at the same time, it is better to plan for more data than you may need to allow for expansion and also because it will teach you good practices moving forward with regards to not over extending resources and complexity where it is not needed.

When you read the comments made by DPortas, you should also consider something else. Each of us approaches database design differently. Good methods exist but no single method is guaranteed to be right or wrong in the abstract. Specific methods might be wrong for specific models. But in general, as we say here in the USA, there are many ways to skin a cat. All of them require that you start with a firm grasp of the cat. And so it is with databases. You require a firm grasp of your problem before it is really safe to start. Hence my earlier comments about thinking before writing.

I think that is true of all things in life. But this is exactly why asking questions and getting both answers and reasons for answers is important. We use the same phrase in the UK and I totally agree with it in almost everything we do in life.

In my head I know exactly what I want from this database. I can both see and picture it. But achieving it in Access I know will be a challenge as I am forced to approach things differently and in new ways. Listening to people like yourself, DPortas, Steve R all help me gain information which I can then assimilate into my own ways and try and use as a basis for what I want to do.

I am a person who never simply accepts something is the way it is, simply because I am told it is. I always seek reasons, understanding and clarity even if it means my brain feeling frazzled. Because then as I go forward I am able to see things I did not before and also grasp new concepts because of things 2 or 3 people said to me previously which suddenly make sense.

Sorry if I ramble on (I do this in real life too), but it is my way of making certain that I understand what is being said. I like to rephrase everything as it helps me get it clear.

I will look up the Synthetic Keys when I get home. Thanks for the tip

Max

namliam · Aug 18, 2010

Maximus Primal said:
Okay I understand this, the reason I separated this and (sadly it probably a throwback to using Excel before) was that I sometimes want to see all issues for a given year and thought it best to have a separate entry for it. But I am guessing with the correct search query I would be able to do this and have Access only look at the year and not the whole date, am I right?

Yes and it is quite easy to do as well, something like 1+1 =

namliam · Aug 18, 2010

Note:
I purposely replayed top down to give my view on things, and you will find a lot of duplication and simularities between this post and others

Maximus Primal said:
However one thought does occur to me and perhaps one of you (or someone else) can clarify it for me.

A given record is located by it's primary key as best I understand. This is in effect the record number, it's location (like an address) of where the record and from it any inter-related data can be found.

In my example, would I be best in having the magazine title with the primary key (via the auto number facility)and then having this number fed to things like the country and issue so they all share the same primary key and relate to that exact magazine or do I assign the key a different way?

This is the discussion database key vs surogate key vs user key.

Inside the database your better off using an autonumber as your primary key, I am a solid believer in that, because I have seen so many times that 'real' keys DO tend to change. For a database part of the definition of a Primary key is "unchanging", its a real bugger to change a primary key inside an existing application I can tell you.

The way you then display/use the actual fields/records is up to you. I.e. the short answer IMHO, Yes.

As such though one can argue about the design, as things like (original) number of pages, price, country, etc are bound to the unique to the issue. Things like condition, comments, etc can be specific to the one issue in hand.
Thus you would not have a 1:1 relationship but a 1:N relationship between the two tables. 1 issue 10 copies each with its own condition, etc...

Maximus Primal said:
My thought was to have a list of the magazine titles and their launch dates, and where applicable the date of the last issue. But this is a country specific thing, as (for example) CosmoGirl in the USA launched before the UK edition and continued long after ours ceased publication.

This depends, in a lot of situations the "last" is a calculated value from the database and should not be stored. However if your collecting things the last in your collection may not be the last issue thus it may not be calculatable.

No you can not have multiple primary keys, you can have how ever a Compound key which is difficult enough to work with and is often times replaced by a autonumber. Also you can have as many Foreign keys as you like in any table.

Perhaps you need a table that is "GlobalInfo", which will contain the name, original printer, etc and then have a table "CountryVersion" or something that will contain the country specifics when it started there etc... But that will not hold the name, because that would be part of the global, --that is--
Unless it is possible / desired to have a local name as well.
I.e. Cars Magazine, in dutch can still be Cars Magazine... But just as easily be "Auto gids" or something simular dutch... Dutch now are not that keen on making their own version, but french, german, spanish and I think Italians can be a little more 'attached' to their own version.

Perhaps the reverse? if a magazine originates in a non-english country, it then gets exported into an English country, it will rarely keep its original name.

dportas said:
BCNF / 5NF are in most cases more important and useful than 3NF and BCNF is generally easier to understand, apply and verify than 3NF. My general recommendation would be that you try to start with a schema in BCNF or 5NF, which is always achievable in principle. Then possibly denormalize in the special cases where BCNF is not dependency preserving (3NF is always dependency preserving whereas in a few cases BCNF and higher NFs are not).

the highest form of Normalization IMHO is 3nf, optimal usage of space. 4/5(BCNF) are denormalizations to make for better performance where it is at an 'overhead' cost of more storage and a bit more 'one off' handling of data.

In my extensive -though limited to the use of BCNF etc- experience the use of BCNF is limited in the world of access, simply because the average database size doesnt work well with the trade off. Also the duplication of data is something that a lot of people are strictly opposed to. Yet in bigger AR/billing systems for example this is quite common practice.

As for starters, even pretty advanced databases, most can live very well and should be designed in 3nf. anything beyond that -certainly for beginners- is just creating hazards, pitfalls, confusion and what not.

One of the 'confusing' statements is to the discussion do we store calculated values, the base answer "No not ever"
However.... 'de-normalized' versions like 4nf and 5nf then say things like...
If a calculation rarely changes and is very cumbersom/complex/time dependant/time consuming or if every single millisecond counts in returning the value(s) then it is acceptable to store this data. This can be / is confusing, in particular to the beginner. Also a lot of those rules / assumptions will not apply to access, simply because of its own practicalities and limitations.

The Access Way.

LOL, The access way.... welcome to the marvel world of databases....
This though is KEY to learning /entering the twilight world of databases, it IS a world of its own.
It really truelly is... anything you consider you think about anything else you may be able to apply to this, forget it... You ARE starting from SCRATCH in a new world of marvel and amazement, sadly though like in many a world there are pittfalls, traps and things that will try to eat your database and things like that...

dportas · Aug 18, 2010

namliam said:
the highest form of Normalization IMHO is 3nf, optimal usage of space. 4/5(BCNF) are denormalizations to make for better performance where it is at an 'overhead' cost of more storage and a bit more 'one off' handling of data.

In my extensive -though limited to the use of BCNF etc- experience the use of BCNF is limited in the world of access, simply because the average database size doesnt work well with the trade off. Also the duplication of data is something that a lot of people are strictly opposed to. Yet in bigger AR/billing systems for example this is quite common practice.

As for starters, even pretty advanced databases, most can live very well and should be designed in 3nf. anything beyond that -certainly for beginners- is just creating hazards, pitfalls, confusion and what not.

namliam,

I think you may be confusing 5NF and BCNF with something else. BCNF and 5NF are by definition a "stronger" form of normalization than 3NF. In other words if a database is in 5NF or BCNF it is already in 3NF. This is implicit in their definitions. BCNF was a further refinement of 3NF by Codd and Boyce when it was realised that 3NF wasn't quite strong enough. The problem being that 3NF only eliminates non-key dependencies on non-key attributes whereas BCNF abolishes non-key dependencies on all attributes.

BCNF and 5NF have nothing to do with "denormalization" compared to 3NF because they are about removing join dependencies not implied by superkeys where denormalization means adding new non-key dependencies to the design.

Maximus Primal · Aug 18, 2010

Thank you all for the very valuable input on all of my questions and the assoicated issues I raised out of them. It makes for very interesting (if complicated at times) reading and shows really how everything is so different to what I have done before. You all show why planning, more planning, scrapping an idea and starting again are also vital elements of the design.

Sadly it appears not to be a case of bolt on, replace or bin an existing routine to sort of issues or add features due to the way the tables inter-relate to each other and the forms. I am also now consciously aware that what I will go on to create soon, will probably not be what I need and it will under go a change akin to putting a new body shell on a car then replacing the engine - to use a bizarre metaphor(?)

While I admit I do not fully grasp some of what you all say and there are bits which are as about as clear as wading through a bog on a misty night with sunglasses on - I know this is part of the "fun" of what I am going to attempt.

I think for now it is time to sit down with my trusty copy of word and plod through the design again... and again... and again before I think about laying down the foundations.

One last thought, I see there is software (both free and paid for) which are meant to help with the design and planning of a database. Are they worth getting or are they more a "look good, does nothing" type thing? Just wondering if I should take a look at them or not.

Thanks again people. Your help and thoughts has been.... enlightening!

Max

The_Doc_Man · Aug 19, 2010

I think it is time for me to haul out the "sticky-note" lecture. {other readers groan...}

It might not be practical for you but I find this method helps sometimes. Before you start getting into the weeds with your DB, you need to know a couple of "Old Programmer" rules.

1. Access cannot do anything via automation you could not have done yourself by hand. Practical translation: If you cannot do it by hand, how will you ever tell Access how to do this thing?

2. Access cannot tell you anything you didn't tell it first. Practical translation: When planning a database, think about what you wanted to see when it was done. Work backwards from that to assure that you capture the data needed to answer the end questions. For example, let's just say you wanted to track the name of the editor for a given issue of a given magazine. That means you need to STORE that name in order to be able to recall it later. You would be surprised at how often that information is not remembered during early design only to have to be retro-fitted later.

So... find yourself a white board. Invest in a box (not a pad... a PACK or a BOX) of sticky notes. Post-it or a knock-off brand, doesn't matter, and some dry markers. On the white board, write your table names. Now put some sticky notes under each name to represent the records that will be in the table. When you are done, here is where you cramp your brain.

For each result you wanted to see, be it a datasheet, report, form, or generated flat-file, go through the mental exercise of deciding how you would build that thing. This is where you will "discover" relationships you had not previously considered. This is also where you might realize that you were not tracking something you needed for one of your reports - see rule #2 above.

Take notes on each process - because remember rule #1 above.

When you have finished this process, you are ready to implement and have put yourself in the proper mindset to think about DATA FLOW issues. You will thank yourself for the work you have done.

Just for clarification: DPortas and I sometimes get into discussions about nuts & bolts issues. He's more of a theoretician than I am. I'm a pragmatist who didn't start on Access, but the product where I did my first real databases is no longer available, having morphed into something else many moons ago. I respect his viewpoint, I just don't always see it the same way he does. So if some of the discussion seems to have a little disagreement sometimes, don't worry. We try to keep it civil and often end up agreeing to disagree. But then, as a newbie, you need to know the controversies as well as the agreements.

Good luck on your project. I'm not stepping away from you, but I've pretty much beat this topic to death and I have a project out at work that will consume some of my time. If I don't check as often, it's one of those old silly ideas that the people who pay me expect to do something for the money. You know how that goes...

database normalization - question on practices of this

Maximus Primal

Learning and struggling..

namliam

The Mailman - AWF VIP

Maximus Primal

Learning and struggling..

namliam

The Mailman - AWF VIP

The_Doc_Man

Immoderate Moderator

Maximus Primal

Learning and struggling..

Maximus Primal

Learning and struggling..

Maximus Primal

Learning and struggling..

Steve R.

Retired

dportas

Registered User.

Maximus Primal

Learning and struggling..

Attachments

The_Doc_Man

Immoderate Moderator

Maximus Primal

Learning and struggling..

namliam

The Mailman - AWF VIP

namliam

The Mailman - AWF VIP

dportas

Registered User.

Maximus Primal

Learning and struggling..

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread