Help getting my data through 1nf 2nf in to 3nf

thedvdaddy · Feb 27, 2010

Hi all and thanks for taking the time to stop and take a look at this for me, I know this is my first post so it may seem a little cheeky that I am asking for help already!

I am a mature student and help out on other forums myself such as Dreamincode and toolboxIT.

I have started a database

module in my foundation degree and have decided that database's are the devils crosswords.

Please note any of the following data can be removed, added to or changed to help me SHOW the statges of normalization, Also I have attached the senario for your reading pleasure but I have included a brief one below,

I have been given a senario and have come up with what I think is suitable UNF data and think....STRESS THINK.. I have it to 1NF. The senario is a simple one. "the head of the sixth form wants to keep track of students university applications, he is also resposible for helping write their personal statement, students can have up to five choices and if they get no offers they are eligible for 'EXTRA' which allows them to make an extra choice. Could anybody take a look at it for me and either let me know if i am on the right lines or if I have left the rails completely, I have spent all day watching Youtube videos on this but I cant wrap my heads around it, Mainly because they always use examples with about five items that clearly sort in to neat tables!. ANY help would be a massive bonus for me. Thanks in advance all!

THE UNF DATA
Address_Line_1
Address_Line_2
Application_Complete
Application_Deadline
Application_ID
Application_Submitted
Course_ID
Course_Name
Date_Of_Birth
Eligible_For_Extra
Eligible_For_Student_Finance
Email_Address
Finance_Application_ID
Finance_Applied_For
finance_Approved
First_Name
Gender
Home_Phone_Number
Middle_Initial
Mobile_Number
Offers_Received
Personal_Statement_Complete
Post_Code
Region
Student_ID
Surname
Teacher
University_ID

1NF
NON REPEATING
Student_ID
Application_ID
First_Name
Middle_Initial
Surname
Date_Of_Birth
Address_Line_1
Address_Line_2
Post_Code
Home_Phone_Number
Mobile_Number
Email_Address
Finance_Application_ID

REPEATING
Student_ID
Course_ID
Application_Deadline
Application_Complete
Application_Submitted
Personal_Statement_Complete
Offers_Received
Region
Course_Name
University_ID
Eligible_For_Extra
Gender
Teacher
Eligible_For_Student_Finance
Finance_Applied_For
finance_Approved

I know that if my first table (Non repeating)
is correct then because it has just one key it should move over to 2NF as it is. BUT I really dont have a clue.

georgedwilkinson · Feb 28, 2010

You have a ways to go. Every type of "thing" should have its own table. I use a rule of thumb to see if a field belongs in a table by asking myself: "did this 'thing' have this when it was born or will it have it when it dies?" For instance, a person (a student or faculty member) will have a date of birth when they are born (and die) and they'll have a date of death on the day they die. So those "seem" to be appropriate fields for a person table. This is pretty static data...it only gets changed once, really.

Another question: "does this 'thing' depend on this field for its identity as a thing?"

Now, will a person have an address or phone when they are born or die? Yes, but it is dynamic...it is not about the person, it is about a place, another "thing". They might have several during their life (and you may actually want to capture several of them). So you might consider having an address table.

Is financing about a person? No. It is about an event (another type of "thing"), or multiple events, that has/have occurred or will occur in the future. At least one other table, then, right?

Is university ID with a person when they are born or die? Not really...most people go through life without a university ID and some with several. Isn't it really about a relationship between 2 parties? Another potential table for information about the relationship between people or parties (in my models I actually have a table called "PartyRelationships").

I could go on but I'll give you the pleasure of researching this and asking specific questions as appropriate.

I am currently in the process of writing this whole data modeling process up as an article and will try to post it on this site when it is ready.

Please feel free to ask specific follow up questions.

thedvdaddy · Feb 28, 2010

Thank you George, that is the sort of example I am looking for. A rule of thumb that is plain and simple to follow, So I guess that I should ask the most basic question first,
1- My teacher implied that when moving from UNF to 1NF there will only every be two tables. 1 table will be things that do not repeat through the data and the other will data that repeats. This makes no sense to me as, For example, in the sample data I have created no student has the same date of birth so it is in the table of unique data. But in a sixth form of say 300 students it is quite likely that this data would repeat.
Could you clarify that what he said about there only every being 2 tables when in 1NF is wrong or correct, He is a new teacher and following other peoples power points so he may have got it wrong or I may have not understood what he was saying.

Also, for this assignment we need to show the progression through each form, if I followed your rule of thumb would that move my data from UNF to 1NF or further along in the NF stages.

Looking forward to your reply!
P.s I put the brief up with this post for anybody who wants to have look.

georgedwilkinson · Feb 28, 2010

A lot of people have this all wrong (including the "authors" of the Wikipedia entries on normalization). If you read a text book on normalization, you will frequently see the words:
"A data model in X-1 normal form is usually already in X normal form" or something of that sort.

It says this for every normal form covered, usually. The assumption is then, that to NOT be in X NF, you must break a rule that is covered by that NF. In fact, you almost have to go out of your way to do that. Once you understand the basics of normalization, it is very difficult to break ANY of the normal forms.

The comment that many people make that they have their data model in 3NF but they don't want to normalize further is nonsense. Normalization is not to make it difficult by putting you through an academic exercise, it is to remove data anomalies. Data anomalies caused by breaking ANY NF are bad, period.

Generally, lots of fields in a table may indicate that the table is a candidate for further normalization exercises.

An interesting thing you see about normalization in some texts is that there can be no null fields in any normalized table. This would mean that nearly 100% of the tables I have seen in the real world are not normalized. It would also make understanding a data model and programming the system for it would be agonizing...but it is quite doable. I did have a data modeler working for me at one time who created just such a data model, however, we were unable to implement it with Oracle's help and 10s of millions of dollars invested. I don't want to be the programmer dealing with the no null columns rule.

I fear that your teacher and I might not see eye to eye and you'd be better off finding out his opinions (that's what will be on the test). Then when you've got your grade, forget everything he taught you and learn to do it right.

thedvdaddy · Feb 28, 2010

Yes that is my plan work to the marking sheet.how is this looking for 2NF, would I just break up tbl_University to form a Course table for 3NF?
tbl_studentStudent_IDFirst_NameMiddle_InitialSurnameDate_of_BirthGenderEligible_For_ExtraEligible_For_Student_Finance

tbl_ApplicationApplication_IDApplication_CompleteApplication_DeadlineApplication_SubmittedOffers_ReceivedStudent_ID

Tbl_Contact_DetailsAddress_line_1Address_Line_2TownRegionPostcodeHome_Phone_NumberMobile_Phone_NumberEmail_AddressStudent_ID

Tbl_Personal_StatementPersonal_StatementPersonal_Statement_CompleteDate_PS_CompleteApplication_ID

Tbl_Student_FinanceFinance_Application_IDFinance_Applied_Forfinance_ApprovedApplication_ID

Tbl_UniversityCourse_IDCourse_NameTeacherUniversity_IDApplication_IDuniversity_Name

gemma-the-husky · Feb 28, 2010

I find the easiest way is to think about the entities you need to model (without considering the detail at all) - and then ascribe each datum (item of data) to one of these entities. Bit of stepwise refinement here - as you go through this process, you may realise your initial table design was incorrect, and you have to reconsider. The key to this is that (among other things) repeating groups of data indicate that the tables are not fully analysed.

Its all a black art, and experience is invaluable

bear in mind that you CANNOT have a many to many relationship

many students-many courses MUST be broken into 2 one-to-many relationships

so you get
1 student - many student-course-applications AND
1 course - many student-course-applications

========================================

so given that the project is to keep track of university applications then you need

a) a university table (the basic uni info)
b) a courses table (the basic courses info)
c) a university-courses table (ie which unis offer which course)
d) a students table (the students details)
e) a students-courses table (linking students to the selected courses)

the fact that for a given student, you can only have 5 entries in the e) table is an implementation detail, and nothing to do with the design.

you may then start getting into issues, such as - does a student write one application statement for all course applications, or different ones, that may be selected for the different applications - so it is not completely clear exactly where the student statement belongs - but generally you sohuld ber able to find a home for each bit of data in the above - and if you cant, it may just point to a redesign need.

georgedwilkinson · Feb 28, 2010

I really have trouble answering this type of question because I believe the premise is all wrong. A table (and thus it's database design) is NOT in XNF when it can introduce data anomalies that are corrected by normalizing with that NF. Additionally, when you correct the anomaly, you have to work hard to insert anomalies for higher NFs. That means, that unless you have done something incredibly wrong to a table's design (almost as if on purpose), once you have normalized to XNF, you have also normalized to X+1 thru X+n NF. Stated another way, it is nearly impossible to break normal form rules for more than 1 normal form (except as an introduction from a lower normal form's anomaly). Stated yet another way, once you have fixed the lowest level anomaly, you have fixed ALL the higher level anomalies, too (you have to work hard to introduce higher level anomalies once lower levels are satisfied).

But Dave's advice is correct: define your big level objects/entities then assign data fields as appropriate for that entity.

For instance, a person has a date of birth, a date of death, possibly one or more names, etc. Those things help define that person as that person. It is something with the person when they are born and/or when they die and it hardly ever changes (except by legal decree in some cases). An address does not define a person as a person. It may help to define that we got a certain person right, but it isn't with that person both when they are born and when they die and it can be changed easily (unless you're the mover). An address however, does define a place in space, in a way. Thus, one of my entities in a design may be an address. While I'm initially thinking about it, I don't really care what an address is made up of (it's data fields), just that it is about a discrete thing in space that helps me do business in my system. Same with relationships between people. Yes, we have parents when we're born but they don't define us as a particular person, since we can get different parents, spouses, children, etc. throughout our lives in a seemingly chaotic way.

When you have defined all your atomic things that meet your system requirements, then you go to work defining the relationships between all those things (using the words one and many).

You'll find that if you think like this, normalization just happens. It is difficult to screw up a data model (i.e. introduce data anomalies) if you define everything nearly atomically.

One of the last things I worry about is the data fields in an entity (though I usually fill them in along the way). If I start adding a lot of data fields (more than 5, say), I start worrying about whether I identified all the things in the system I need to track and I'll go back iteratively as necessary to fix it.

Starting design with data fields is backwards and is a remnant of spread sheet thinking. You cannot build a good (i.e normalized) database using spread sheet thinking.

Help getting my data through 1nf 2nf in to 3nf

thedvdaddy

New member

Attachments

georgedwilkinson

AWF VIP

thedvdaddy

New member

Attachments

georgedwilkinson

AWF VIP

thedvdaddy

New member

gemma-the-husky

Super Moderator

georgedwilkinson

AWF VIP

Users who are viewing this thread