1NF, 2NF, 3NF - Normalisation

Navyguy

Registered User.
Local time
Today, 10:42
Joined
Jan 21, 2004
Messages
194
1NF, 2NF, Etc

After reading several posts and articles, I think that I am actually getting a slight grip on this subject.

My question(s) is/are...

If I have a field that requires a tick for either yes or no, does that violate the normalization rules?

Example: A student report card must show whether they participated in Track and Field (and say a bunch of other activities). My tick boxes are TFP and TFDNP for Track and Field "Participated" and Track and Field "Did Not Participate" respectively. The issue here is the result of the information is printed on a "report card" and must be indicated either way.

My next question is...

What about the same idea as above but where you select "Levels of Participation".

Example: Student class participation might be recorded simple as tick boxes for "Full", "Some", "None".

What I see here is many tick boxes not being used in all the areas because in the Participated/Did Not Participate only one box would be ticked and the same with the Full/Some/None where only one third would be ticked.

I would think that by only using tick boxes that would make the DB smaller and more managable instead of using say a field where the information was inputed/selected by the teacher.

By the way, this is not a real issue, but I am trying to understand the 1NF, 2NF idea so I stay out of trouble from the start on future projects.

Navyguy
 
On this section of the forum there is a thread "A Book by Pat" within this thread you will see there is an extract that explains Normalisation. That's one of the best explanations you will get anywhere.

Whether an attribute of a relation is a tick box or anything else is irrelevant when it comes to normalisation.
or in English

Whether a field of a table is a tick box etc

Len B
 
Re: 1NF, 2NF, Etc

Navyguy said:
Example: A student report card must show whether they participated in Track and Field (and say a bunch of other activities). My tick boxes are TFP and TFDNP for Track and Field "Participated" and Track and Field "Did Not Participate" respectively.

In this instance one textbox would suffice. If it's value was (-1, Yes, True) then you could say that the student did participate; if it was (0, No, False) then the student obviously did not participate.

My opinion of checkboxes in tables is that they tell you nothing other than a True/False instance. I'd much rather have a Date field that stores the date track and field was attended. This way I'd know that if there was a date that the track and field was attended and if there was no date then it was not attended. Having a data would also allow me to query any data range and find out who attended when. It offers more scope for reporting.

But then, if this was a student table then I wouldn't have this field in the table at all as whether a student attended an event is not something that is dependant upon the the primary key of the students' table.

I'd have a table for events and a table for students - a third table between them as a junction.

The third table would use the primary key of both the event and student tables as foreign keys (and also as that table's primary key) and a date field stating the date of attendance.


Example: Student class participation might be recorded simple as tick boxes for "Full", "Some", "None".

In this instance I'd have a further table for "frequency" or whatever you'd term it. The table would just have an Autonumber as its primary key and a text field naming it

i.e.

Code:
FrequencyID          Frequency
       1                         Full
       2                         Some
       3                         None

In the relevant table I'd replace the need for multiple checkboxs by having one field where the user can select which of these options is appropriate rather than have unnecessary checkbox fields.



The one thing you have to remember is that databases should be built downwards and not outwards. So anything repeating groups should be given a table of their own and allowed to collect downwards. Having repeating fields that are rarely used creates redundant space.
 
Right then...With what I have been reading (just a short time), there are no exceptions to the rules. Whether it is user entered text data or yes/no tick boxes.

So what I am thinking is what if the report, call it "Report Card" is a format that cannot be modified drastically and the requirement is to indicate either "yes" or "no", or "fully", "some", "none". Would there not have to be a tick box in the report at all of those locations so that when the report is generated any one is positively indicated.

Or even simplier (maybe), what if you had a checklist that you wanted to print out and you needed to indicated in one of three columns what the person did or how they did.

I guess modifying the report would be the easiest, but what if that was not an option for what ever reason, like it was the only "approved format" from the school system or some ISO certicication...whatever.

Of course this could be just my limited understanding of how the reports work with the tables, etc.

Navyguy
 
I'm going to jump out on a limb here and say that the presence of the yes/no field isn't the problem with normalization. Mile-O-Phile gave you good comments about how he would approach your problem. However, I think he missed one factor that is in your problem description. I am enough of an analyst that I see a hidden implication in your description that IS a problem. I will pick it apart just a little for you.

If you have a yes/no field, it can only apply to a unique event. If the table containing this field also holds the student's name and the date of the participatory event, you might have a normalization issue unrelated to the yes/no field.

To handle this problem, I would have several tables:

A. Student (student ID number {pk}, name, other identifying info unique to that student)

B. Event (event ID number {pk}, date, description, other identifying info unique to that event)

C. Participation (student ID number {fk}, event ID number {fk}, participation info)

where participation info could be a grade, a yes/no box, and even a specific comment field (memo field, perhaps) if you wanted it. A student would only be graded if they were ELIGIBLE to participate in the event. How you populate the table with eligible students is a separate issue.

This approach would be relatively normalized. The reason it is so is because

(1) You separate information according to its category. Student data goes in the student table, not the event table. Event data goes into the event table, not the student table. A separate table tracks the points where these two data subsets intersect.

(2) This design allows you to record events in which NO ONE participates. It also allows you to pre-define events before you even have a class roll set up. Or, conversely, to define your students before you have any events defined.

(3) You can do joins in a way that allows you to report student participation by event (Join PARTICIPATION to EVENT over EVENT ID) or by student (Join PARTICIPATION to STUDENT over STUDENT ID). Join queries are WONDERFUL things for this kind of reporting.

(4) Because the PARTICIPATION can be sparse (hold entries only where participation is possible), you minimize the size of the resulting tables. Having two ID fields in the PARTICIPATION seems like a lot of overhead, but suppose that they are both LONG integers (recommended). That is 8 bytes for the two IDs. If you have exactly 8 bytes worth of other data in the table, then a sparsity rate of 50% breaks even on storage and having more than 8 bytes means you are ahead even at higher population levels. If you are tracking more than one class and more than one event that is unique to each class, the savings mount up VERY quickly.

This might seem like a lost point, but if you DON'T have that third table, where do you store the data for each event in a way that allows you to have a unique storage location? THIS UNIQUENESS ISSUE IS ONE OF THE MAIN POINTS OF NORMALIZATION! By creating the separate table, you assure the unique location exists. And it is normalization analysis that causes you to create this table.

There is another issue. Access is a modeling tool for business processes. Your real-world process has these issues: Students, events, and OPTIONAL participation in selected events. Therefore, each issue needs to be represented separately in the model. Note that for MANDATORY participation, you would choose the same structure simply because of normalization, but if there is a situation where participation is optional, then the model itself demands separation of participation records from the events.
 
Re your report. If the format is fixed then it is fixed. Tick boxes give 2 options only On/Off, True/False or Yes/No. That's it. So if you are stuck with the format then all the other posts are still relevant its just that the possible choices are limited to 2.

Whether or not you use tick boxes is one of preferrence and what is defined to be in the report

Posts coming in a bit quicker than I can type

Len B
 
So of course that makes total sense Pat. So what about the "Full", "Some", or "None"? Could the same practical format be applied to meet the normalization principles when 3, 4, 400 options are chooseable? (not sure if that is a word…)

Again this could be just my limited report knowledge.

Doc_Man

Now that I am beginning to understand this normalization thing a bit, I think that I would create separate tables for events, attendance, etc. I think at first that I would not, but now I am beginning to see the logical process that is required. The learning curve seems like a cliff...

Navyguy
 
So... I went away and researched option groups, and I found out that they are “Great”. To bring this back to normalization, I can see where the option group only enters one piece of data regardless of the options thus conforming to the principles of normalization.

I guess my next comment/observation would be; It seems difficult to fully use the principles of normalization if you don’t know what you can do with the database. Of course you don’t know what you can do until you try and that trying will probably destroy the principles of normalization! This reminds me of a dog chasing their tail!!!

So do most people just muddle their way along with DBs and then through reading or training find a better way to conform to the principles of normalization? Quite frankly I never heard of normalization until just a couple of days ago, and it was quite by accident. I can easily see where your first attempts at creating a DB would be quite different if you applied the principles of normalization from the beginning.

Too many questions in one post but…Who says that your DB is 1NF, 2NF, etc? Does the “developer” say that it is or does the developer submit it to the AFNDbE (Association For Normalization of Database Experts) AKA Access World Forums.


Navyguy
 
Navyguy said:
So do most people just muddle their way along with DBs and then through reading or training find a better way to conform to the principles of normalization?

I'd say the majority do. From UK experience - especially regarding companies - people seem to be put on courses and taught how t build tables, queries, reports, and (if lucky) macros only. The only problem is that the people training them are typically internal trainers who have been on a course and taught how to build tables, queries, reports, and (if lucky) macros only. :rolleyes:

The actual concepts of database design aren't taught. The concepts are usually taught in university degrees but, when assessments are due, the students sign up and detail all their database problems (for their assessments) and they typically have no knowledge about proper design either. :mad:

I'm personally self taught within the last year and a half and it annoys me that people are getting paid (as their job) to build databases when they obviously have no clue about the process.

But, in answer to your question, YES most people probably do muddle their way along until they get stuck (through lack of knowledge/experience) and post here or similar forums and are told about normalisation. Sometimes, unfortunately, some are too stubborn to accept they are wrong and forge on with badly designed databases. It's their job though and if they have to redesign through bad practice then so be it.

Who says that your DB is 1NF, 2NF, etc? Does the “developer” say that it is or does the developer submit it to the AFNDbE (Association For Normalization of Database Experts) AKA Access World Forums.
.

Simply put: practice makes perfect.
 
As Mile-O says most people muddle through but what goes around comes around so in general after muddling through the problems initially as soon as someone wants a modification all the problems return.

When you are aware of Normalisation then always attempt to achied 3NF. I say attempt because you may think your there when indeed you are not.

If you have tried then you will be in a much better position when modifications are requested.

Yes Normalisation is normally not taught on courses because you need to understand dependancies and relational algebra to do it by the book so to speak and a course does not have the time.

Dependancies and relational algebra take time and do not have the profile like tables, queries forms etc so they are not taught.

Actually buggers people up completely when the application starts to get a bit complex.

Len
 
Len Boorman said:
Yes Normalisation is normally not taught on courses because you need to understand dependancies and relational algebra to do it by the book so to speak and a course does not have the time.

Actually buggers people up completely when the application starts to get a bit complex.

From the courses I've seen listed they are usually one or two day courses. I don't think an extra day would hurt them. :(
 
Ok… I have to throw this out there…

Can you achieve 3NF by organizing your tables, paying strict attention on how they react to one another and literally having no primary or foreign keys… basically relying strictly on organizational skills?

And what about a derived field such as ‘total’?

From what I understand this would violate the 3NF, but someone told me that sometimes it is easier (faster) to store a value and then return that stored value than to run a query and calculate all the fields that are necessary to return that particular value, especially on large multiple listings?
 
Okay simply put

Without Primary Keys you cannot Normalise because the rules governing Normalisation refer to Primary Keys

Derived Data would not normally be stored as you say.
Faster Hmmm not sure because really you need to run an update query just in case somebody has updated a value that could affect the "total".


Correct in that a derived value would violate 3NF on the basis that the derived field has a transitive dependancy to the Primary Key.

Len
 
  • 1st Normal Form (1NF)A relation is said to be in First Normal Form if and only if each attribute of the relation is atomic. More simply, to be in 1NF, each column must contain only a single value and each row must contain the same columns
  • 2nd Normal Form (2NF)In order to be in Second Normal Form, a relation must first fulfill the requirements to be in First Normal Form. Additionally, each nonkey attribute in the relation must be functionally dependent upon the primary key.
  • 3rd Normal Form (3NF)In order to be in Third Normal Form, a relation must first fulfill the requirements to be in Second Normal Form. Additionally, all attributes that are not dependent upon the primary key must be eliminated.

So, WindSailor -

Can you achieve 3NF by organizing your tables, paying strict attention on how they react to one another and literally having no primary or foreign keys… basically relying strictly on organizational skills?

Nope. You'd fall at 2NF as it requires all fields in a table to be dependant on the primary key.

And what about a derived field such as ‘total’?

A violation of third normal form but you can't get to there because you haven't got to 2NF.

Here's the normalization basic from About.com
 
Maybe a bit picky but I was taught I think that 2NF purely requires dependancy and a total field would actually satisfy that providing that you had a primary key.

i'e the Primary key will identify one value of Total. i.e the value that exists in the tuple identified by the primary key.

But interesting conversation

Len
 
Len Boorman said:
I think that 2NF purely requires dependancy and a total field would actually satisfy that providing that you had a primary key.

Dependency, yes, as non key, or key is not specified.
 

Users who are viewing this thread

Back
Top Bottom