Steps/Process to Normalizaing an Existing Database

lmg0115

Registered User.
Local time
Today, 11:16
Joined
Jul 29, 2008
Messages
38
Can someone tell me what the steps are to normalizing an existing database. I have inherited a database that has 54 tables with 22 appearing in the relationship window. Most of which do not have established relationships. Firstly I created a copy to play around with. I have run the table analyzer and most of my tables are fine. I have defined the tables that are critical to the process. My next step, I believe is to define the relationships, which are not present in the current database. What would you say are my next steps? Thank you in advance for any assistance you can provide.
 
You're probably going to get lots of links on this one.;) It takes a lot of typing to really explain it, and there are already hundreds of web pages that already have the answer. Here's a decent start:
http://en.wikipedia.org/wiki/Database_normalization

Read up, give it a try, ask questions when you're stuck.
You may even want to upload a version of your DB (with no data, forms etc - just table structure) for us to offer up suggestions.
 
Last edited:
Try using Google. There are a lot of good sites about Normalization out there. Also there are some good threads on this forum so search that as well. Good luck. Remember to post any queries you have here and we will do our best help you.
 
I wouldn't trust the wizard to tell me the tables are "fine". It can do only the most basic of tests and it relies primarially on finding patterns of repeating data in rows. It won't identify repeating groups because it can't analyze column names and even the column names may not have a repeating pattern so only domain knowledge and an understanding of normalization would pick them out.

I would start by examining each table and identifying its primary key. Then I would go through each column and determine its relationship with the primary key.
 
In line with my question on normalization, can a single junction table be used for multiple many to many relationships? i.e. tblemps, tblline, tblshift, tblWO. Each of these tables has a many to many relationship to the other. Can I have a single junction table with tblempsID, tbllineID, tblshiftID, tblWOID with the junction table as the many and each of the others as the one?
 
In line with my question on normalization, can a single junction table be used for multiple many to many relationships? i.e. tblemps, tblline, tblshift, tblWO. Each of these tables has a many to many relationship to the other. Can I have a single junction table with tblempsID, tbllineID, tblshiftID, tblWOID with the junction table as the many and each of the others as the one?
Thats not how you use a junction table. The junction table is the 1 and the other sides are the many. For example iyou have a many to Many relationship between Students and Courses then you would have a Student table with a primary key of say StudentID and a Course table with a primary key of CourseID then your junction table called tblStudentCourse would have fields of StudentID and CourseID

your Junction table might look like this

Student1 Course1
Student1 Course 2
Student2 Course1
Student2 Course2
Student2 Course3 ets
 
Last edited:
Rabbie, I stand corrected. Thank you. However, is it possible to have a junction table as I described switching the relationship around? Or, would you suggest leaving the tables as standalones and developing the relationship in my queries?
 
You can do a lot of reading, but eventually it comes back down to this: The ONLY way to normalize a database is to identify the entities being represented. Then segregate non-similar entities to tables of their own.

Not all databases are this way, but many represent an abstracted BUSINESS MODEL showing data flow, transactions, and the relationships of things that are objects, actors, components, etc. Just looking at it roughly, I would say that your database is a model of something. If that is the case, there is no substitute for first assuring that you understand the real-world process of which the .MDB is a model.

I infer from what I read that you have a shop with some sort of work order structure, a product line (but that could be an assembly line instead; the "read" is ambiguous), a bunch of employees, work shifts, and the like. Therefore, you are looking at tracking work done on specific products or product lines, done by employees on a work order, and done during a particular shift. You are probably tracking hours worked or something similar. The model represents work flow as employees are given a work order to make some part of your product line. No?

In which case here are some entities that come to mind:

Employee
Work Order
Product Line
Specific Product
Shift

Not so clear from the thread so far: Departments, consumables (used in making the product), customers (whose purchase order causes you to ISSUE the work orders), etc.

These all belong in their own tables with simple junction tables between them. To decide what goes into the junctions, you must know what relationship exists.

For instance, if you have multiple departments, it is possible that work orders could be tied to different departments. If you have what is commonly called "matrix management" at your shop, the work order would be tied to both an employee AND a department (therefore THREE fields in the junction, particularly if the employee can float between departments in that matrix.)

You MUST understand the physical work flow before you can properly normalize the tables that are part of the model because it is ALWAYS AND FOREVER the case that the data drives the model. Don't let the model drive the data. I.e. YOU make the business decisions and after that, make the model conform to your decisions. Don't make a choice because you cannot make the model do something. Make the choices because the business says "that's the way it has to be." Then figure out how to adapt the model. And the first step is always to understand the process underlying the model.
 
Doc Man, thank you for the information. I have a complete understanding of the model. It is simply as you said. We make widgets on various lines for various customers based off of various workorders, using many employees who work various shifts. The purpose of the database is to track all activity related to the workorder and to report on this activity. The line is my main table as everything that we track and report on comes back to the line number. I have individual tables for the following: Line, Customers, workorders, shift, employees, Header(multiple), Printer(one) (both of these are parts on the line)Downtime, Correctiveaction, and Changeover. By my way of thinking I need multiple junction tables because of the several many to many relationships required to make this pull the information the way I need it to be structured. In the current database relationship window none of the tables have established relationships which makes it very difficult to pull a report for example that gathers all of the data for a specific line. Such as, all workorders, all employees on each shift, identifying each header and printer on a given date. Under the current structure I had to perform 4 queries including a union simulating a full join to build the final query that I could use to do the report. If the database were normalized I should have been able to point to the line table (which would have the other tables as foreign keys), along with the other tables in my query window to do a simple query to produce this report. Is my thinking correct here? If so, I need assistance with developing the relationships. Thanks for all of your current and future assistance with this.
 
Are you asking if a junction table can include lookup FK's? If so, the answer is yes. For example, you have classes, students, and advisors. At this school, a student has an advisor for each class rather than a single advisor so the junction table would be:

ClassID, StudentID, AdvisorID

You can also include other non-key fields such as final grade.
ClassID, StudentID, AdvisorID, FinalGrade

You can even add additional key fields to allow the primary relationship to exist more than once. For example, if a student fails a class and needs to repeat it, you would need to include YearSemester as a third field in the primary key.

ClassID, StudentID, YearSemester, AdvisorID, FinalGrade

So you have the junction between class and student, you have the qualifier of year/semester, you have a lookup to the advisor table and you have a simple data entry field to hold the final grade.
 
Pat, thank you. That's exactly what I needed to know.
 
I'm sorry to keep asking this, but using the fields I have listed below can someone help me set up the relationships between the tables?

Line, Customers, workorders, shift, employees, Header(multiple), Printer(one) (both of these are parts on the line)Downtime, Correctiveaction, and Changeover

There are multiple production lines, each line has a printer and at least one and sometimes two headers. There can be many workorders for many customers being worked on many lines. And many employees can work many shifts, with at least two employees per line who can work on multiple workorders if moved to a different line on a given day. Many employees can work multiple shifts. The relationship between tblDowntime, tblCorrectiveAction and tblChangeover is set at many to one to the Header and Printer tables. I apologize if this is confusing but let me know if you require further clarification. Thanks again for all of the help. I really need it.
 
Is there anyway you can strip out most of the data and post a sample of your database?
 
Just as a suggestion to help you visualize the end goal - sometimes concepting of what you need isn't enough.

Review what you think is going on, take some data and try to organize it in a spreadsheet - ONE worksheet.

Go across the columns and give each one a meaningful name all the way out to that very last end item of the data you think you want to collect. Now, enter about 5 rows of data and take a look.

The second part to this trick is figuring out what you want out of the database. Take those 5 rows of data and see if you can create the feedback (reports, statistics, etc.) If it works, great!, if not, add more columns or what-have-you to massage the spreadsheet to these goal(s).

Once you have both parts done, it will be very obvious the data fields that will be required and what data fields will be redundant. This will help organize your data into the seperate tables for normalization and developing the relationships.

Again, it could be a long and drawn out excercise, but in lieu of a formal originating (and derived) requirements document process this could be the shortcut to save you countless hours down the road wondering what went wrong.

In addition, this is something you could post up and give alot of information to the experts here so they may provide the best solution.

-dK
 
Unfortunately, I can not strip out enough information not to compromise my companys data and also to provide you with a picture of what I am trying to do. Would it help if I provided a screen shot of the current relationship window? If this doesn't help I appreciate your willingness to offer assistance. I'll just have to work it out. Thanks again.
 
You say you have 54 tables, with 22 present in relationship windows.

I just want to be 100% sure- does the other tables happen to be linked? (You can tell by looking at database window and seeing a tiny blue arrow next to the small icon on left of each table's name.)
 
Errr .. make up some data so nothing is compromised. The point is to test prior to spending all of your time building/tearing down/rebuilding until you can achieve goals. This allows for reverse engineering to ensure goals are mapped to requirements and you spend minimal amount of time in the design process.

If the structure or the type data you are capturing itself would compromise data, then recommend hiring a 3rd party consultant or company and bind them with intellectual agreements through the contract.

Alternatively, if data security is in an issue, request formal training from company so that in-house programs can be developed competently as a return on investment so 3rd party entities won't be an issue.

If the company doesn't believe in anything except bootstrapping their way through things, ask them why their data is in the state its in and if it's critical to business operations to either (a) pay for some formal training, or, (b) pay for a consultant.

-dK
 
dkinley, you make some very valid statements but neither of the above will happen as resources and funding are both stretched. But I completely appreciate the sentiment.

Banana, I know where you are going. Yes some of the missing tables are linked, but over half of the linked tables actually do appear in the relationship window.
 

Users who are viewing this thread

Back
Top Bottom