When to Normalise and when not to

InstructionWhich7142 · Jun 12, 2020

When you google thing kind of thing you end up on stackoverflow learning about 6NF, I've had to take over databases designed like that, they make sense, however starting from scratch with just the tables is fairly tricky when you have more tables with 3 or 4 columns of ID's than you have tables of human readable data

Anyway, I've been thinking about two semi related databases, one is a little task tracking system and I want recursive subtasks so I'm fairly happy this works as 1 table and a form can fairly easily navigate up and down the levels no issues, (it's mostly for me and maybe 2 other users)

My second system is for projects which are basically a sequence of steps that need performing to complete it, (this is probably for around 20 users and tens to hundreds thousands of records)

I'll have a number of template projects sections and when a requirement comes in i'll use them to build a customised project,

In my head, [Project Sections] & [Projects] are the two parent tables and then [Project Section Actions] & [Project Actions] are two corresponding child tables, a [project action] will often refer to a [Project Sections] record and there will be an option to open a view of the detailed [Project Section Actions] but usually the summary from the [Project Sections] header will suffice,

There are about 5 fields in common between the Parent tables and 5 unique to each type, However there are more like 8 or 9 common between the two Action tables and about 3 that are unique.

I'd rather only develop one lot of forms and outputs because most of the unique fields are internal references, the usage of both things from the user view is identical,

Or would it be better to have two tables and swap the recordsource over?

Question of ease of development vs normalisation

Thanks

The_Doc_Man · Jun 12, 2020

The larger the database, the more you will benefit from strict normalization, though often taking it beyond 3rd normal is overkill. Ease of development doesn't have to be incompatible with good normalization. What MIGHT happen, though, is that you have to think about what you are doing a little bit harder, like a chess match with one of those chess masters who looks 5, 10, or 20 moves ahead.

Basically, the more time you spend up front analyzing the design and normalizing what you can, the better off you are.

I know that doesn't answer your specific question, but then I really can't, because I don't know your specific environment.

plog · Jun 12, 2020

Question of ease of development vs normalisation

That's not really a good question (I am disregarding the literal fact that its not a quetion at all). I mean, the easiest system to develop would be a pen and paper system. It would take next to no time to develop but place so many responsibilities on the user--tracking the right data in the right format and require them to search through a file cabinet to find the right piece of paper at the right time. Super simple in terms of development, super pain for the users and anyone wanting data out of the system.

I think the question you need to ask is 'What's the most efficient development method?' . If the system needs to be used for more than 24 hours the answer involves normalization. I've used Excel as an unnormalized database to accomplish tasks that I thought I would never need the data for again.
Often it turned out that data and/or system would need to be used repeatedly and I ended up taking that spreadsheet and building a normalized database from it. An ounce of normalization is worth a pound of cure.

when you have more tables with 3 or 4 columns of ID's than you have tables of human readable data

I'm not saying you are making this mistake, but for every 10 poorly normalized databases I see posted on this site, I find one over-normalized database. That statement makes me think you might be over-normalizing things. It's ok to not put every piece of data in its own table; it's ok to have a bunch of strings as data in one table.

For example, I've seen addresses like this:

id_Cust, StreetNumber, id_Street, id_City, id_State, id_Zip
178, 1423, 7, 13, 34, 212

Where they had a table for streets, a table for cities, a table for states and a table for zips. That's over normalized and wrong. That address should be stored like this:

id_Cust, Street, City, State, Zip
178, 1423 Main Street, Springfield, IL, 62629

Pat Hartman · Jun 12, 2020

The only databases that are commonly not normalized are data warehouses. That is because these databases are NOT updated once they are created. They are simply replaced periodically and de-normalizing things like lookups make reporting easier for the user and don't cause any conflicts because no updating can break anything.

Beyond that, feel free to be rational about normalization. For example, my state table has a two character ID as the PK (and NO autonumber) which is the state abbreviation. This state table is properly normalized but doesn't obscure the actual data value for no purpose. You could do the same thing with cities and streets if you are dealing with a very small geographic area. The fact that a table has an autonumber PK doesn't make it normalized. What makes it normalized is that it doesn't violate any of the first three rules of normalization. You might need to go further than 3rd normal form but most databases don't unless they store huge amounts of data.

One thing I don't do is to use data from other systems as my PK. It may look nice because you have something to hang your hat on because EmployeeNum is a string you can "read" and it has meaning. However, YOU don't have control over the IDs generated by other applications so it is better for you to keep the ID as a data field and use an autonumber as the PK. You should still assign a unique index for the EmployeeNum because you don't want to allow duplicates.

Isaac · Jun 12, 2020

plog said:
For example, I've seen addresses like this:

id_Cust, StreetNumber, id_Street, id_City, id_State, id_Zip
178, 1423, 7, 13, 34, 212

Where they had a table for streets, a table for cities, a table for states and a table for zips. That's over normalized and wrong. That address should be stored like this:

Wow ... that does sound crazy. I would almost say that's not as much 'over normalized' vs. a misunderstanding of even what normalized means. True relational entities are what become tables ... Not just lists of potential attributes.

Pat Hartman · Jun 13, 2020

Readers' Digest was one of my clients and they broke addresses down to very small pieces. There is an amazing variety of street address formats found in just the US.

PreStreetDirection (NSEW)
PreLetter
HouseNumber
PostLetter
PostStreetDirection (NSEW)
StreetName
StreetSuffix
Apartment

There may have been more but that's all that come to mind. The point of all this detail was to facilitate de-duping and minimize failed deliveries.

jdraw · Jun 13, 2020

You may get some ideas from this US Postal Addressing Guide.
I recall an application that was dependent on Canadian Postal code for some key info. When Nunavut was established that screwed up the Postal Code and addresses. Similar situation when Newfoundland codes became Newfoundland and Labrador; and same when Quebec changed from PQ to QC. Don't depend on another system for your key info. Use codification that you have control of and/or confidence in.

I have often said that there are programmers out there that have built a career on
editing/converting/adapting address formats to accommodate "less than common" input/outputs.
Happened a lot when someone bought/buys mailing list(s) from a variety of well marketed sources.

Galaxiom · Jun 13, 2020

plog said:
For example, I've seen addresses like this:

id_Cust, StreetNumber, id_Street, id_City, id_State, id_Zip
178, 1423, 7, 13, 34, 212

Where they had a table for streets, a table for cities, a table for states and a table for zips. That's over normalized and wrong. That address should be stored like this:

id_Cust, Street, City, State, Zip
178, 1423 Main Street, Springfield, IL, 62629

The advantage of this scheme is that the street names would be selected from a list of names that actually exist in the city (or select suburbs that have the street name). Address databases like this also have limits on the numbers that are accepted for particular streets. It is not unusual to have the street name split with a list of acceptable street types (eg Street, Lane, Avenue etc) to chose from in a list.

When you have free text for addresses you always have spelling mistakes and other incorrect entries. I administer a database that uses free form address entry and have set up several queries to find errors. For example I compare the street names in the same suburb and test for small spelling differences and find which addresses don't include standard street types or abbreviations. You would be surprised how often even the word Street is entered incorrectly.

When to Normalise and when not to

InstructionWhich7142

Registered User.

The_Doc_Man

Immoderate Moderator

plog

Banishment Pending

Pat Hartman

Super Moderator

Isaac

Lifelong Learner

Pat Hartman

Super Moderator

jdraw

Super Moderator

Galaxiom

Super Moderator

Similar threads

Users who are viewing this thread