Hi ACM,
Normalization is about accurate representation of dependencies (functional and join dependencies) and solving certain problems that result from having undesirable "non-key" dependencies in tables. In a nutshell, normalization is a formal set of principles for analysing and eliminating undesirable dependencies from database designs.
When you talk about eliminating "repeating" data it's possible you may be conflating at least three very different issues, which is something that GanzPopp hasn't exactly addressed for you, so I'll try to do so.
Firstly there is the concept of
repeating groups. Eliminating repeating groups was one of the motivations of 1st Normal Form (1NF) as originally described by E.F.Codd (inventor of the relational model). You can safely ignore the problem of repeating groups as it was something very specific to the legacy CODASYL type of database and is irrelevant in most modern DBMSs. A repeating group is where a table contains variable length array of values, which may or may not repeat. Most modern DBMSs (those based on the SQL model anyway) don't have any such feature - groups of values are never "repeated" in a table in the sense that 1NF is concerned with. In a SQL DBMS, columns can usually contain only single values. The concept of repeating groups is slightly muddied however because Codd later modified his definition of 1NF to say that tables should not contain
nested relations (a relation being a real or virtual "table", i.e. the data structure that the relational model is based upon). This revised version of 1NF is controversial however and not everyone agrees on whether it is necessary or desirable always to avoid nested relations. Fortunately the dilemma of nested relations can usually be discounted. Most DBMS vendors don't even support nested relations even though they are part of standard SQL. Most database developers will avoid using nested tables even when they are available in their software.
The second type of "repeating" data you may have in mind is the idea of multiple columns in a table serving the same or similar purpose. Such designs are often regarded as an anti-pattern but sometimes quite wrongly labelled as a "repeating group" even though they are no such thing. A classic example of this kind of anti-pattern is where you have a collection of columns enumerated for the same kind of data item. E.g. a contacts table with columns for multiple email addresses called EmailAddress1, EmailAddress2, EmailAddress3, etc. This is usually a bad idea on the principle of DRY (Don't Repeat Yourself). If you store the same kind of attribute multiple times then any logic that depends on that piece of data may have to be duplicated or made more complex than it needs to be. Although such designs are often undesirable, formally speaking this type of repetition only violates Normal Form if nulls are allowed in any of the columns where no value is specified. Tables without nulls are a requirement of all of the "classic" normal forms (1st-6th NF, EKNF, BCNF, PJNF), although the "no nulls" requirement is frequently forgotten or ignored by database developers who use nulls.
So far I've been describing 1st Normal Form (1NF) and possible violations of it. 1NF is a complicated and often controversial topic but there's a decent introductory write-up here by Anith Sen:
www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/
A third type of "repeating" data has been discussed already in this thread and here I do have to disagree with some of what GanzPopp said, or at least clarify the meaning of it a little. Let's take the example of StadiumName. The stadium that a match is played at is certainly an attribute of a match and surely does belong in any match table. Of course if there is more than one match at any stadium then the values of the stadium attribute will be repeated on multiple rows. Note very carefully however that
normalization (the principles of analysing dependencies and applying normal form) is entirely independent of and indifferent to the
type of data used to identify a stadium. Normalization says nothing about what type of attribute you should use to identify a stadium and certainly does not require you to think differently about the stadium attribute just because its values repeat on multiple rows or because it may be a text value and not a number. Substituting a numeric StadiumId in place of StadiumName (i.e. replacing a string with a number) therefore has
nothing to do with normalization. In any case, such a substitution obviously does
not eliminate "repeated" data from the match table: StadiumId would be repeated just as often as the stadium name. This is an extremely important point if you want to understand what normalization is (after all, this the database theory forum!). In fact normalization
never involves inventing new attributes or substituting one attribute for another. Normalization certainly does not require you to identify or eliminate data simply on the basis that it repeats on multiple rows of a table - something which would anyway be impossible and/or totally unnecessary and undesirable.
The question you really need to consider in this case is what is the most prudent, concise and accurate way to identify a stadium. There are three useful criteria commonly applied to choosing identifiers - identifiers which typically become keys in the database. Those criteria are: Familiarity, Simplicity, Stability. A stadium name is probably familiar to the users of the match database but it isn't necessarily very simple or stable - there could be multiple spellings or formattings of a name and the name might well be subject to unpredictable changes, e.g. when the team's sponsor changes.
While there are plenty of reasons why you may or may not want to have a stadium name in the match table, you should also appreciate that those reasons have nothing to do with normalization per se and really have very little to do with eliminating repeating data.
My advice to you is that you put aside the misleading notion of "repeating" data - it is not really a helpful or important concept in database design. Study a good book on database design to understand some more of the things you should be thinking about. One very good, concise but insightful book that I often recommend is Fabian Pascal's "Practical Issues in Database Management". For a more in-depth treatment of how to design a database: "Information Modeling and Relational Databases" by Terry Halpin. Whatever you do, don't rely on what you find on Wikipedia or many other online sources (including this post of mine!). Study some reliable and well recommended sources of information or take a reputable course. Avoid books or courses based around specific software so that you can master general principles first before you get into the nuances of software products.
Hope this helps.