Second Normal Form a Massive Table

alsheron

Registered User.
Local time
Today, 13:52
Joined
Apr 4, 2007
Messages
10
Hi, i'm new here and I know there's a lot going on but i was hoping someone could point me in the right direction or give me a few tips...

I've imported a very large (nearly 15000 records) set of data into an Access database as one big huge table storing various bits of information about coins. As far as I can tell, the data is in at least first normal form, in that no multiple values are stored in any columns (not yet sure of the technical term - field?)

Anyway, I've managed to use a make table query to separate out some of the content of the table into separate tables with relating entities.

One question is, and please bear with me - How do I add an autonumbered primary key to these newly separated out tables and then add these autonumbered items to the original tables as IDs so that I can set up relationships? Is there a "proper" way to go about describing this?

The following consists of ALL the fields in the original table and how i've split them up by table:

STOCK
Group
ItemNo
ListNo
TypeCode
ActualDate
SortDate
CostPrice
SalePrice
Estimate
StockLevel
DateEntered
DateSold
Archived
Ref1
Ref2
Picture
Description
Cats
DenomCode
TypeCode
PeriodCode
ArchiveWhenSold
NotForSale
KeepSold
StockItem
Highlight
OnLists
OnWeb
Proof

DENOMINATIONS
DenomName
DenomCode
DenomHeading
DenomValue

PERIODS
PeriodName
PeriodCode
PeriodHeading
PeriodStartYear
PeriodEndYear

VARIETIES
Variety

The Variety field in the Variety table is especially problematic. There were 1259 varieties stored in the original table which are now in a separate Varieties table which is great. But how do i now add an autoincrement number and call it, say, VarietyID, but then link each Variety back to the main list of coins but replace the full Variety description with the newly created VarietyID?

If any of that made sense, i'd REALLY appreciate some wisdom here.

Thanks
 
If you create a new column in the variety table, set it to autonumber and then set it to be the primary Key, Access will autopopulate the existing entries on that table with a number.

With regards to updating your primary table with the new references in variety, I'd suggest adding a newcolumn "VarietyID" or somesuch and running a query to populate with references, then deleting your existing "Variety" column once you are happy that you've got the data right.

To update your table a query along the lines of
UPDATE Table1
INNER JOIN Variety ON Table1.[variety] = variety.[description]
SET Table1.varietyID = variety.[varietyID];

Should do the trick, once you're happy that everything has worked correctly, you can then remove the old variety text from your main table.
 
Thanks tehNellie, that sounds ideal. I'm not exactly sure how to run that bit of code there, which i know i'll need to edit. I'll give it a try though.

What does the "variety." refer to? Should I replace "Table1" in the SQL code with "Stock" because thats the name of the primary table? Also, is "description" an SQL term or is it meant to reflect a field name?

Sorry....to clarify: I now have a "Varieties" table with all of the varieties now in it stored under a Field Name of "Variety". I have also added a VarietyID field as you suggested which has an autonumbered Primary key set. I also have a Stock table with fields as per my original post but now i've added in a Variety Field so that this Inner Join will work. The original table i made both the Stock table and the Varieties table is called "allcoins_all". Hopefully that makes more sense.

I'm getting there!

Thanks!
 
Last edited:
Sort of. The inner join basically says show me stuff from Variety where the description matches an entry on Table one.

In the first instance you can run a SELECT query to check your data if you like. Something along the lines of:
SELECT Table1.[Variety], Table1.[description], Table1.[varietyID], variety.[VarietyID], variety.[Variety]
FROM Table1 INNER JOIN variety ON Table1.[Variety] = variety.[Variety];

shows you:
Code:
Table1.Variety	description	Table1.VarietyID	Variety.VarietyID	Variety.Variety
red	          more stuff		                      1	                    red
red	          stuff		                              1	                    red
blue	          stuff		                              2	                    blue

Table 1.[varietyID] is empty at the moment because we haven't populated it yet, but we can see what the ids are that we are going to get because the existing variety data in Table 1 matches the description in the Variety table.
 
Last edited:
What does the "variety." refer to? I'll replace Table1 with "Stock" because thats the name of the primary table. Also, is "description" a special term or is it meant to reflect a value i have chosen.

Variety is the name of your variety table. To prevent confusion like this (I'm easily confused) I like to prefix my tablenames with "tbl" so Stock becomes tblStock and variety becomes tblVariety. I also try to remember to bracket column names within queries. The format TableName.ColumnName isn't always required, you can just use the column name if it's unique within the query, but again, it can really help prevent confusion

[Description] is just my generic name for the column in my imaginary table1. What you'll need to do is replace that [description] column with the name that you currently have in the Stock table that holds the current information that you've now moved into the Variety table.
 
I'm not exactly sure how to run that bit of code there, which i know i'll need to edit. I'll give it a try though.

If you change the table/column names to fit your existing data you can copy and paste that information into the query Tool in Access, selecting "new", design view, then selecting SQL view. If you then paste the query, update as required then switch back to design view, you'll get a graphical representation of the query, which make be a little easier to get a feel for what the query is doing.

As good as the graphic tool is for building queries in Access, I do recommend at least reading over the SQL it creates. AS good as the tool is it is quite limited in what it will let you build, and getting familiar with SQL will make you a much more effective database 'monkey' ;)
 
Thank you VERY much for all this help and your effort replying. I'm going to try this out now..... I'll let you know how it goes! :-)

Oh, i've updated my second post.... I just want to make sure you saw the following:

"I now have a "Varieties" table with all of the varieties now in it stored under a Field Name of "Variety". I have also added a VarietyID field as you suggested which has an autonumbered Primary key set. I also have a Stock table with fields as per my original post but now i've added in a Variety Field so that this Inner Join will work. The original table i made both the Stock table and the Varieties table is called "allcoins_all". Hopefully that makes more sense."
 
So that I've got this straight.

tblAllcoins_all is your original data?
tblStock is currently a copy of Allcoins_all that you are modifying to Normalise?
tblVariety contains an extract of all the distinct variety types with a new ID field?
(i've just added the "tbl" prefix for clarity)

The join needs to be made on the existing "Variety" text data on your Stock table.
 
Ok, sorry for being a pain in the ass. but i'm not quite there yet. I've tried modifying your SQL code but i think i'm simply not proficient enough (yet) so I was hoping you could look at the following and re-write it so it will work without modification (so that i can then understand how it works - hopefully)

The capitalisation (or not) of certain values in the SQL code you gave confused me so I have renamed the tables as you suggested so that i now have the following tables:


tblAllOriginalData (this table was then split into the following)


tblStock
tblVarieties

tblVarieties has two fields: VarietyID and Variety. The VarietyID now has an autonumber field and is autonumbered correctly.

I've made an empty VarietyID field in the tblStock table and assigned it a Number type. There is also a Variety field which contains all of the original data that was used to create the tblVarieties table.

Could you please edit your original SQL statement to take these changes into account. I had a bit of trouble trying to edit the original because i'd obviously not mapped the names correctly.

Many Thanks. :)
 
OK, in the first instance, we're going to run that original Select query to see what we're going to get. The capitalisation in the Query just shows the SQL Statements themselves.

SELECT tblStock.[Variety], tblStock.[varietyID], tblVariety.[VarietyID], tblVariety.[variety]
FROM tblSTOCK
INNER JOIN tblVariety ON tblStock.[Variety] = tblvariety.[Variety];

So what you should now see are 4 columns,
1) Your original variety text from tblStock
2) a Blank Column of VarietyID from tblStock (Because we haven't populated it yet)
3) The ID from tblVariety, this is what will go into VarietyID in tblStock when we run the second query.
4) the Description from tblVariety.

Now if everything worked well, the two Variety columns will show the same information and you'll have the same number of rows in your query results as there are records in TblStock.
If there aren't then you are missing some variety types or maybe you have some typos or records that don't have a variety type set.
 
Now, if we assume that everything looks ok in the first query and you're happy that you aren't missing a bunch of records then you can run the query to populate your new tblStock.[varietyID] column.

UPDATE tblStock
INNER JOIN tblVariety ON tblStock.[variety] = tblVariety.[variety]
SET tblStock.varietyID = tblVariety.[varietyID];

This should update the same number of rows as were displayed in your select query. If you are happy let the update run.

To check that everything worked well, you can simply run the Original Select query again, what you should now see is that tblStock.[varietyID] contains the same information as tlbVariety.[varietyID]

When you are happy that your information is correct you can delete the tblStock.[variety] column.

You should also perhaps consider establishing a formal relationship between tblVariety.[varietyID] and tblStock.[VarietyID] this will mean that you can ensure that only a valid ID can be entered in this column.
 
Last edited:
Thanks tehNellie! I'm going to run through that when i get home ..... It looks very clear and simple, and i finally feel like i'm beginning to understand how this is working and tying together You've been a fantastic help and i really can't thank you enough!....

Hope you're having a fantastic day..... you should be!

I'll let you know how i get on.....

Many Thanks,
alsheron
 
It's clear and simply until you get results that you weren't expecting. :D

When you start I'd suggest that you note down how many rows you have in tblStock.

When you run both the select and the update query you want to see the same number of records as you have rows in tblStock.

If you have more records that you expect then you need to check tblVariety as you probably have the same variety type entered more than once.

If you have less records than you expect then you need to check tblStock as some entries will either have no variety information added or typoes or something else that means they aren't matching the descriptions in tblVariety, though as you've taken this information from tblStock to begin with, this shouldn't happen.

Good luck with it, let me know how it goes.
 
Good news..... :)

I've managed to use your instructions to separate out the Varieties successfully. I've also applied the same method to other separated out content such as Denominations and Types which i've called tblDenominations and tblTypes respectively. I've created a relationship between tblStock and these new tables using their respective "ID" fields that were created.

tblStock.TypeID is now related to tblTypes.TypeID ...... Which as far as i've been able to work out so far is the correct way to organise this data. Of course the TypeID values are linked correctly for each coin so that their type is stored as it was originally except instead of the text for the type it's now just the ID number that (hopefully) points to the new tblTypes table.



One table however(!), the tblPeriods (stores information about the period of the coin) is proving more difficult, again simply because of my inexperience.

tblPeriods has the following fields:

PeriodID (autonumber and primary key)
PeriodName
PeriodCode
PeriodHeading
PeriodStartYear
PeriodEndYear

I've attached a screenshot to show what kind of data is stored.

As you can see, there are no unique values to use for the inner join command (it seems) so i think i'd need to use two fields, and from looking at the table the two fields that could uniquely identify a PeriodID would be PeriodName and PeriodHeading. Is there a way to use both these fields to perform the same action we did with the other tables? In other words, i'm trying to put the correct PeriodID value in the tblStock table according to the PeriodID values in the tblPeriods table but while identifying the correct PeriodID using both the PeriodName and PeriodHeading...... Aaaaargh!!! lol....

One other problem.... When i created the tblStock table from the original source table (which i still have) i forgot to copy accross the PeriodName and PeriodHeading fields which means they aren't currently in the tblStock tbl. I'm guessing i'll need to do another inner join and what I do have common to both tblStock and the original source table is a unique ItemNo field. I'm also guessing i'll need to do this step before the above step to assign a correct PeriodID to tblStock?

I really hope that was understandable because i know exactly what to do in my own mind, but trying to acheive it is something quite different..... :)

What's the best way of tackling this? Would it be useful if i'd attach something else? (the database itself is too large for this attachment system)

Also, i'd love some advice on recommended sources for learning this. I've got a couple of books and i've read web tutorials and the like, and i'm learning a LOT from this so far, but what would a good place to start be considering i'm not a complete n00b, but only just? ;)

I'm going to experiment a bit now..... backups will be made!
 

Attachments

  • periods.jpg
    periods.jpg
    76.9 KB · Views: 181
Last edited:
As you can see, there are no unique values to use for the inner join command (it seems) so i think i'd need to use two fields, and from looking at the table the two fields that could uniquely identify a PeriodID would be PeriodName and PeriodHeading. Is there a way to use both these fields to perform the same action we did with the other tables? In other words, i'm trying to put the correct PeriodID value in the tblStock table according to the PeriodID values in the tblPeriods table but while identifying the correct PeriodID using both the PeriodName and PeriodHeading...... Aaaaargh!!! lol....

There's nothing to stop you performing the join on more than one Column so the select Statement now looks like

SELECT tblStock.[PeriodName], tblStock.[PeriodHeading], tblPeriod.[PeriodID], tblPeriod.[PeriodName], tblPeriod.[PeriodHeading]
FROM tblSTOCK
INNER JOIN tblPeriod
ON tblStock.[PeriodName] = tblPeriod.[PeriodName]
AND
tblStock.[periodHeading] = tblPeriod.
;

Your UPDATE statement would follow exactly the same principle as the SELECT statement. (i'm not writing it all out for you, it's for your own good, honest ;))

One other problem.... When i created the tblStock table from the original source table (which i still have) i forgot to copy accross the PeriodName and PeriodHeading fields which means they aren't currently in the tblStock tbl. I'm guessing i'll need to do another inner join and what I do have common to both tblStock and the original source table is a unique ItemNo field. I'm also guessing i'll need to do this step before the above step to assign a correct PeriodID to tblStock?
OK, so you have the item unique number in both tables. What we can do here is to create the Select query between your original table and the Period table, then use that Select query to update your tblStocktable.

1) Create your select query between tblOriginal and include the unique number and the period ID as these are the only values that you need to update tblStock (just amend the Select Query Above)

2) Save that query, for now I'll call it Query1

3) Create your update query performing the Join on the Query1:

UPDATE tblStock
INNER JOIN Query1 ON tblStock.[uniquenumber] = Query1.[uniquenumber]
SET tblStock.[periodID] = Query1.[periodID]

And we've removed the need to ever create those two columns in tblStock.​
 
Also having looked at your periods table screenshot, you do seem to have a lot of what appears to me to be redundant data. I'm not an expert on coins at all so that table make actually make perfect sense, but having the same code for two distinct entries has necessitated you having to add the ID column. ie GH2 can refer to two possible period entries which means that a potential "Meaningful Primary Key" has had to be replaced with a Meaningless Primary Key (the autonumber field).

Records 28 and 29, for example, to the untrained eye look functionally identical, do you really need two distinct entries for it? If they are actually distinct Period entries, might you not be better off giving one of them a new code and look to replace your ID column in the future?

You can argue the rights and wrongs of whether to use existing data as a primary key, I'm not normally a big fan of it, but in that case it does appear that it might be a good candidate if the reworking of the codes/de-duplication of data is practical.

Also, i'd love some advice on recommended sources for learning this. I've got a couple of books and i've read web tutorials and the like, and i'm learning a LOT from this so far, but what would a good place to start be considering i'm not a complete n00b, but only just?

If it's any consolation I'm self taught as well. I came from a VBA background into Access but I did find "access Database Design and programming" from O'reilly to be a very handy book. It's not great if you are looking for indepth help on creating forms, but for general database conventions, basic SQL tutorials and the like I found it very handy. There is bucket loads out on the web that is far cheaper than buying a book and I've learnt far more from sites like this while trying to get my databases working that from any books. Personally I think you're trying the right approach at the moment.

What I would suggest from personal experience is get to grips with the Access Query Builder which will help you visualise what your queries are doing and at the same time at least have a look at the SQL view and try to understand how the SQL is working.
 
Last edited:
I know exactly what you mean and i actually agree with you on the redundancy of some of the data. It might make sense to remove some of the redundancy because having a meaningful primary key would be very useful. In practice however, the PeriodHeading is useful on its own merits and also the PeriodStartYear and PeriodEndYear arent always the same (see screenshot PeriodID items 22 and 23..... As far as i can tell there's no way to remove the redundancy easily if at all unless of course you have an idea? ;)

Thanks again by the way for your most recent reply. I'm actually off home again at this point and will be having a nice weekend off work until tuesday, but i'm loving this stuff - it's fascinating - and so much so i may try out some more when i'm at home..... In any case, your efforts are much appreciated and i hope you have a nice weekend too.....:)
 
I'm off to Exmoor to teach the Mrs how to Map Read and blow the cobwebs out of the tent for the weekend :)

I did notice that in some cases the dates were different, the obvious suggestion is that you assign them a new code to differentiate where needed and otherwise remove the duplication. Again, update the ID information to your Stock table before you delete data.
 
More Progress

Great Success!

After a good long weekend (I ended up in the Peak District - which was nice), I came back to work on Tuesday and i sure felt the pain. I simply couldn't concentrate on anything and try as i might, i couldn't get my head around all that was explained, the SQL statements and the instructions. I even think the database i was working on wasn't saved and so i had an old version. Yesterday I wasn't very happy.

But today, great success! I actually ended up re-building the database using your instructions tehNellie and it all went very well - which leads me to believe i'm actually starting to understand (and enjoy) this. I think i've cracked it and put the database in order with little or no redundant data and relationships defined between tables. I'm not 100% clear on why or how the relationships are actually used later, but its definitely logical and i'm sure it will make sense later on down the line.

I couldn't have done it without your help tehNellie and for that I say again: Thanks! Youve done a great job as a teacher because you deliberately didn't give me ALL the answers but just enough so that I could work it out for myself. In the end I managed to create the Join statement i needed to update the tblStock with the PeriodID from the tblPeriods.

Next step for me now is to try and figure out how to properly use Forms to enter and modify the data. When i first started out with this little Access project i went straight to the forms but soon realised you need the data set up in the background first.... and so now i have.

Peace out and hope you had a great weekend in Exmoor. :)
 
I'm not 100% clear on why or how the relationships are actually used later, but its definitely logical and i'm sure it will make sense later on down the line.

At a basic level, relationships help protect your data. They formalise within the database the [type of] relationship between, for example, your TblStock.periodID column and the tblPeriod table. They can be set to make sure that you cannot add a record to tblStock that contains an invalid reference to tblPeriod. You might want a situation where if you delete an entry in tblPeriod that all corresponding entries in TblStock are also removed, a Relationship can handle this for you.

In principle they are relatively simple to work out, I'd recommend trying out your google-fu and reading a few articles, there are stacks out there and I found this one to be pretty clear about the concepts behind them: Linky


Next step for me now is to try and figure out how to properly use Forms to enter and modify the data.

When I was starting out using VBA with Access I found this Site very helpful for getting my head around forms.

You can use Macros to drive your forms and data entry, but I personally, having come from a scripting background anyway, find them very clumsy and not very clear to use and I really like the Power and control that spending a little time getting to know VBA gives you. Plus once you know some VBA for Access, it will translate easily over to Excel especially, but also Word.
 

Users who are viewing this thread

Back
Top Bottom