Is it time for a change in our thinking. (2 Viewers)

RainLover

VIP From a land downunder
Local time
Today, 21:00
Joined
Jan 5, 2009
Messages
5,041
A lazy programmer is one who is satisfied with something that works. They only seek advice when something goes wrong. Make improvements only if the user points out a flaw.

A good programmer seeks and takes advice from those who are somewhat experienced in their field. They weigh one opinion against the other then choose the one that bests suits their situation. These methods have been tested and have withstood the rigours of time.

A truly good programmer asks questions. “Why” and “Why Not” are two of their better forms of enquiry. These select groups of programmers are not overly influenced by Titles. They know the worth of their advisers and put aside those who promote themselves as Gurus because of the volume of their community involvement rather than the quality.

We read on the WWW, many things. some true, some false and some ambiguous. A lot of which is limited in scope and simply the thoughts of others rehashed to suit the writers intent.

One such subject is Normalization, which was invented by Edgar Frank Codd. First published in 1970. The world of computers and computing has undergone a lot of improvements since then so it would now be appropriate to say that Normalization has its Advantages and Disadvantages.

Advantages

Avoids data modification (Insert/Delete/Update) anomalies as each data item lives in one place
Normalization is conceptually cleaner and easier to maintain and change as your needs change
Fewer null values and less opportunity for inconsistency
Increased storage efficiency
The normalization process helps maximize the use of clustered indexes, which is the most powerful and useful type of index available. As more data is separated into multiple tables because of normalization, the more clustered indexes become available to help speed up data access

Disadvantages

Requires much more CPU, memory, and I/O to process thus normalized data gives reduced database performance
Requires more joins to get the desired result. A poorly-written query can bring the database down
Maintenance overhead. The higher the level of normalization, the greater the number of tables in the database.

So why should we blindly follow the Rules of Normalization without question. Is it time we challenged the sacred cow.

I welcome your comments.
 
Last edited:

ChrisO

Registered User.
Local time
Today, 21:00
Joined
Apr 30, 2003
Messages
3,202
Interesting post and I have to agree.

On the web, gurus rely on gurus who rely on gurus…
Not thinking and just relying on gurus is not the way to go.

I will try to put together an example re saving calculated values.

Chris.
 

RainLover

VIP From a land downunder
Local time
Today, 21:00
Joined
Jan 5, 2009
Messages
5,041
Thanks Chris.
 

Galaxiom

Super Moderator
Staff member
Local time
Today, 21:00
Joined
Jan 20, 2009
Messages
12,852
I have always relied on and applauded basing decisions on understanding the principles that govern any mechanism be that a physical or software construction. It has served me well in both engineering and programming since my mother demonstrated it to me as a child when she solved a complex puzzle that seemed to me to be impossible to grapple with.

However I have never yet found a really good reason to want to abandon normalization though I admit I have done it from expediency in a slap up substitution for a really terrible system (the original was a shared Excel spreadsheet that fell over all the time). I wrote a direct Access substitute one afternoon and it has done the job perfectly ever since.

More complex denormalised databases that I have had to work with have always been a pain.

Calculated values due to the vast size of the calculation are sometimes inevitable. However as I have gained more experience (and moved to a server based back end) I have increasingly realsed that queries can be vastly more efficient than my early attempts.

But then I have never had to deal with a high demand database. These certain can require a rethinking of the dynamics. It comes down to matching the resources with the task.

By all means question the conventional wisdom. It needs to be questioned. I have seen some aweful examples of blind faith in presumed conventions on this site. However we must be careful to avoid becoming an iconoclast which I fear is territory that Rain could too easily venture into with this discussion.
 

Lightwave

Ad astra
Local time
Today, 12:00
Joined
Sep 27, 2004
Messages
1,521
Although isn't that to a certain extent why we're all members here.
It gives us views from people that we know or have come to know as knowledgable. Access that we would not otherwise have. Therefore hopefully we are more flexible than average to outside influence.

Within my area I haven't yet come across a good reason for abandoning normalization either.

I've been listening to a lot of technology and specifically programming podcasts and there is occasionally an article about denomralized database structures.

Tend to be concerned entirely with extremely large systems that require very large concurrency over the internet. They seem to get the space on the podcasts mainly because they are unusual and the organisations that are running are among the forefront of web businesses.

How many of us can really say that these criteria hold true for their applications?

I would imagine that a flat structure could be very good for relatively simple systems with high input eg forums / twitter feeds / the dreaded facebook / whatever other social infliction is being planned by someone.

Of course I could see that if you were designing a front end till system for the net then it could well be valid. But nobody has asked me to do that yet.
 
Last edited:

Pat Hartman

Super Moderator
Staff member
Local time
Today, 07:00
Joined
Feb 19, 2002
Messages
43,293
One such subject is Normalization, which was invented by Edgar Frank Codd. First published in 1970. The world of computers and computing has undergone a lot of improvements since then so it would now be appropriate to say that Normalization has its Advantages and Disadvantages
Codd didn't invent the concept. His writings simply brought it to the people. He translated the higher mathmatics into something most of us can understand. Relational theory draws from set theory, relational algebra, and relational calculus. It may seem old-fashioned to you but unless you know something that I don't know, none of this has changed in the past 45 years. I can't pretend to follow the proofs but I see the results every single day, in every query I write. I have been designing and developing applications since 1968 (before I could vote by the way). I can tell you that flat files are nothing to yearn for. I do occassionally miss IMS though:)

There are several different classes of applications. The two most common being transactional and reporting (data warehouse). By far the most common applications are transaction based. Something happens (a transaction) and we track it. Billing, order entry, weather reporting, inventory, shop floor control, etc. Transactional systems tend to work with small amounts of data at one time and are frequently interactive with batch processing to produce outputs (those bills that come in the mail every day). These databases must be highly responsive and efficient and so are always at least thrid normal form. The other large group of applications are often called data warehouses because they hold a company's historical data and are typically huge. These databases are summarized and flattened (lookups are usuallly embedded) to make reporting easier. The reason that the summarized, de-normalized schema works is because no user update ever happens to a warehouse table so you don't need to worry about RI (which is rarely enforced) or update anomolies. All updates are batched and applied daily, weekly, monthly depending on the level of currency desired.
 

ChrisO

Registered User.
Local time
Today, 21:00
Joined
Apr 30, 2003
Messages
3,202
I think the point here is that normalization is viewed as some sacred cow which cannot even be questioned. Any question is viewed as an attack. Very often words like never and always are automatically inserted into the discussions without any real thought. The discussions then become locked and even thinking is suppressed.

One example of this is normalization and the saving/not saving of calculated values.

Over the years there has been some debate about saving calculated values.
The arguments range from the following points:-

1. Re-calculating everything from scratch through to re-calculating everything from some other base condition but never saving the result.

2. Saving the result of adding the latest transaction to the last saved result.

3. The argument has generally been remiss in not supplying a third alternative.

--------
Point 1 is generally the position taken because of three things…
A. It works.
B. Is simpler to enforce.
C. People can read the advice of gurus and do not have to think about it for themselves.

But point 1 is incomplete in that it only relies on correctness of result, simplicity, laziness and the word never.

There are other problems that might be associated with determining a base condition. Normally we use the term stocktake which is supposed to be some external (external to the computer program) verifiable reference. Typically we count the widgets on some date and enter the count as the latest verifiable reference. I have reservations about some stocktakes being externally verifiable.
--------

--------
Point 2 is generally not the position taken because of two things…
A. For it to work correctly, it requires a lot of work.
B. It adds to the size of the database to store a calculated value.

Point B was certainly a problem many years ago but these days it has faded into insignificance.
--------

--------
Point 3 as an alternative can both re-calculate and save.
It also has the advantage of removing that nasty word, never.

The premise is: if everything is re-calculated, from some base condition, as per the advice of gurus, then it is, by definition, correct. More simply put; if we follow the advice of gurus then it is correct.

I sometimes have my doubts about the advice of gurus but let us assume it is correct on this occasion.
If we have a correct result then why can we not save it?
Let me ask that question another way; if the calculated result is correct and the result is data then why can we not save correct data in a database?

The answer to that last question seems to be brought about by the word never in Point 1. The word never is both arrogant and ill-advised. It is arrogant to the point that the person who advises such things believes that they know all conditions, past, present and future, under which a database will be used. It is ill-advised because they don’t.

So the question becomes; under what conditions can Point 3 both re-calculate and save?

To me, the answer to that question relies on the words writable and read-only.

Let me give an example:
A Project has many Jobs and a Job has many Transactions and each Transaction may have a cost.

A Project has all the expected attributes, one of which is cost. Most times it is the primary attribute.
A Job within that Project has all the expected attributes, one of which is cost.
A Transaction within that Job has all the expected attributes, one of which is cost.

When any Transaction is changed (writable) then the Transaction is re-calculated. The other Transactions within that Job have not changed nor have any of the other Jobs in that Project changed (read-only).

So the re-calculated value of this, singular, Transaction is saved.
The sum of the saved Job Transactions is then saved at the Job level.
The sum of the Jobs level is then saved at the Project level.

By forcing the, single, writable Transaction to be read-only, by saving it, we can now sum all read-only values throughout the Project. The read-only totals now bubble-up to the appropriate level.

If we need to do a mouse move to display the total of a Transaction, Job or the entire Project we do not suffer the requirement to do any re-calculation at all. The mouse move is read-only.
A Form, say aging summary, which is read-only, need not re-calculate any Transaction.
A Report is also read-only, it need not re-calculate any Transaction.
A Chart or Graph is also read-only, it need not re-calculate any Transaction.

The above four examples are non-interfering read-only processes, they do not and must not change data. To take advantage of the increase in display speed available the Transaction totals must be saved.

Under this circumstance, the requirement to re-calculate all Transactions in the Project simply does not exist. Only the current Transaction is changed and only the current Transaction is converted from writable to read-only by saving.

The reason behind this approach is speed. However, speed is usually not the be-all and end-all of design. The data must still be accurate and we should not tolerate exceptions. But with some effort accuracy and speed are obtainable.
--------


When people use Access they can learn by reading the words of a guru. If they are serious they can read the words of many gurus. If ever two gurus disagree they can become confused. When people use Access they do not have to give up thinking. If they attempt to do so they will be attempting to learn by giving up thinking.


Chris.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 07:00
Joined
Feb 19, 2002
Messages
43,293
Chris,
Most of the folks asking questions on this and other Access forumns are not experienced database designers. They are not asking questions about how to optimize and possibly denormalize a database that is already in fifth normal form. They are people struggling to understand basic concepts. They need rules that will serve them well and keep them out of trouble while they grow into real developers. Always and never are not confusing. They are clear and they will keep the poster out of danger until he is sufficiently experienced to make a competent decision to deviate from standard practice. At the point where the poster is actually able to make high level design decisions, we don't hear from him anymore, he has developed sufficiently to be capable of implementing his design decisions.
 

RainLover

VIP From a land downunder
Local time
Today, 21:00
Joined
Jan 5, 2009
Messages
5,041
I appear to have localised this thread to a Discussion on Changing Normalisation Rules.

That was not my intention. I broached the subject of Normalisation mainly because those who I intended as my audience would be familiar with the subject so I used it as an example.

For the sake of clarity it is the experienced developer that I am addressing here about a change in our thinking. Not the novice who has little choice but to blindly accept the advice of the more experienced.

Normailisation is just one subject. The concept of thinking more or challenging the norm encapulates more than just Normalisation.

 
Last edited:

Galaxiom

Super Moderator
Staff member
Local time
Today, 21:00
Joined
Jan 20, 2009
Messages
12,852
I appreciate where you are coming from Rain. It is important for a developer to understand why something is done in a particular way but equally so to question if those circumstances actually apply for the application where they are being considered.

People can have a good set of principles but still go wrong when they misjudge which principles should be applied to the job at hand.

Those who blindly apply the decree of designated authority have indeed lost their way and have fallen victim to the realm of what could be likened to religous belief.

Genuine knowledge is ultimately about a reduction to a set of mutually consistent first principles and it is a very powerful framework to build upon whether the subject is physics, socialology or database design.

Building an accurate set of first principles of database design is one of the reasons I frequent this site. Some of the best learning happens when we dare to question the articles of faith.
 

dportas

Registered User.
Local time
Today, 12:00
Joined
Apr 18, 2009
Messages
76
So why should we blindly follow the Rules of Normalization without question. Is it time we challenged the sacred cow.

Hopefully no sensible and knowledgeable professional will "blindly" follow anything without question. If he knows the fundamentals of data management then the database practitioner will properly understand concepts such as: normalization, dependency theory and the relational model. He'll be able to use such tools to analyse and solve problems as the need arises. On the other hand, problems tend to occur when IT practitioners lack theoretical knowledge about the field they practice. Unlike most other professions, ignorance of foundation knowledge doesn't necessarily disqualify a person from working in IT. Apathy and even hostility towards the scientific knowledge are sadly all too common in the industry.

You seem to be giving the impression (perhaps unintentionally) that database theory has been static for 40 years. That's not remotely the case. There are numerous conferences and peer-reviewed journals on data management topics publishing new material all the time. Database and data management theory is active and constantly evolving. Is there something in current theory you believe to be incorrect or in need of "challenging"? If so then you need only demonstrate a new theory that is more accurate and useful than what went before. That is scientific method.

normalized data gives reduced database performance

Normalization has nothing to do with performance because it defines only the logical features of the database. It doesn't define how the data is physically stored or processed or how much space it will use in RAM or on disk or what operations need to be executed in the database. It would be more accurate to say that database systems based on the relational and SQL models tend to be designed and optimized to work best with well normalized data (for the sake of argument say, 5NF). There are exceptions of course.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 07:00
Joined
Feb 19, 2002
Messages
43,293
People can have a good set of principles but still go wrong when they misjudge which principles should be applied to the job at hand.
When you see a 2-year old with a lollipop in his mouth running, do you explain why that is a bad idea or do you just tell him not to do it? How about an 8 year-old? At some point, people gain sufficient experience and world knowledge to understand "why" and make an informed choice as to when to deviate from recommended norms.
 

spikepl

Eledittingent Beliped
Local time
Today, 13:00
Joined
Nov 3, 2010
Messages
6,142
Some sacrilegious remarks:

And then there is the aspect of falling costs of hardware and performance, and rising cost of meatware, meaning that fast and sloppy programming can be fine, whereas desloppyfying things can entail extra costs in meatware here and now. Pragmatically: what works right now works, and the consequences of bad choices made now are on the next year's budget or turn up mañana (i.e. between now and never).
 

RainLover

VIP From a land downunder
Local time
Today, 21:00
Joined
Jan 5, 2009
Messages
5,041
deportas

I enjoyed reading your post and ask if you could elaborate on this statement.

Normalization has nothing to do with performance because it defines only the logical features of the database.

I would have thought that Normalisation had a lot to do with performance. Especially when you consider Saving Calculated Values. ChrisO wrote about this and I believe a lot more could be written.

ChrisO also wrote on the habit of many people to say that you should never store calculated Value. This is a good example of people not thinking and simply passing on the opinions of the Gurus, simply because they have some sort of Title.
 

RainLover

VIP From a land downunder
Local time
Today, 21:00
Joined
Jan 5, 2009
Messages
5,041
When you see a 2-year old with a lollipop in his mouth running, do you explain why that is a bad idea or do you just tell him not to do it? How about an 8 year-old? At some point, people gain sufficient experience and world knowledge to understand "why" and make an informed choice as to when to deviate from recommended norms.

First of all how do you tell the Child's age then if you get that right what is stoping a Child of a different age listening in.
 

Simon_MT

Registered User.
Local time
Today, 12:00
Joined
Feb 26, 2007
Messages
2,177
Yes, there are norms but there needs to be flexibility. I think you have to look at the questions being asked of your solution. The main question I was faced is the work of art available? A sales person can't sell something that is sold! So I tally the number of available copies within an Edition everytime an event affects a copy within that Edition. Yes this is not normalised.

Another example are Sales with the variable factors - Sales Tax, Margin Tax (dependent on the variable profit) and the Currency rate. So storing values becomes an imperative.

There is an analogy - images can be stores within a database, just because you can doesn't mean that you should. So I would suggest I work on the principle of normalisation, but err from the straight and narrow if an issue arises.

Simon
 

PNGBill

Win10 Office Pro 2016
Local time
Today, 23:00
Joined
Jul 15, 2008
Messages
2,271
Another example are Sales with the variable factors - Sales Tax, Margin Tax (dependent on the variable profit) and the Currency rate. So storing values becomes an imperative.
Simon
Saving The QtySold, SalePrice, CurrencyRate and TaxRate isn't saving calculated results.
 

PNGBill

Win10 Office Pro 2016
Local time
Today, 23:00
Joined
Jul 15, 2008
Messages
2,271
There is an analogy - images can be stores within a database, just because you can doesn't mean that you should.
Simon
I guess MS developers may be like other marketers, they add Bling to sell their product even if it may not be practical for any sizable application.
A Family Database would not have an issue with images saved and speed but a hardware store, saving images of the 40,000 products :(
 

Simon_MT

Registered User.
Local time
Today, 12:00
Joined
Feb 26, 2007
Messages
2,177
I have about 17,000 images of works of art.

Simon
 

Users who are viewing this thread

Top Bottom