Design dilemma - normalization?

Les Isaacs

Registered User.
Local time
Today, 23:13
Joined
May 6, 2008
Messages
186
Hi All

I have a table [tblStaff] with a numerical key 'staffID' and a text field 'staffName', and I have a table [tblSickness] with that joins to [tblStaff] on 'staffID'. There are a lot of reports, and functions, that use the data in [tblStaff] and the related data in [tblSickness]. Would it give performance improvements if I either:

1. created a field 'staffName' in [tblSickness], so the reports etc would only then need to be based on the single table [tblSickness]; or

2. created a function to return the 'staffName' for a given 'staffID', and then used this function for the relevant reports (and so again they would only then need to be based on the single table [tblSickness]?

I know I could 'try it and see', but would be interested in the experience of others who have experimented along these lines.

Thanks for an help.
Les
 
1 - may be faster - but if your tables are properly indexed the difference will be negligible - also creates a headache for maintaining the field and contravenes normalisation rules.

2 - definitely slower
 
If you want to do this "right" the best way (in my not often humble viewpoint) is to establish a relationship between the staff and sickness tables, then build a SELECT QUERY between the two with your wizards enabled. That will cause the wizard to build the correct join for the query. To make this work, the Staff table has to have the StaffID as its primary key - because relations require the one-side of a relationship to be a PK.

Now build the reports off the query because reports and forms want a RECORDSET (not a table). To clarify that last statement, both tables and SELECT queries are equally capable of giving you a good recordset on which to base a report.

This is a side-effect of properly normalized tables. If you build relationships, the queries will come. (Hmmm... sounds like Field of Dreams....). But seriously, normalization lets you take advantage of the wizards. Normally, these wizards are collectively dumber than a box of rocks, but if you give them good table structures and relationships, they actually work pretty well.
 
Another alternative is to set up the Lookup fields on the tblSickness table for the Staff field so it displays the staffName instead of the ID.

When the form and report wizards run they will automatically use a combo to display the Staff. The staff field in the recordset will still be the ID but the combo will display the staffName.

Best remove the lookup from the table when you have made the forms and reports. If you need to add another form later just copy and paste the combo from an existing form.
 
Thanks for your replies.
I understand the principles of normalisation, but my query was really about whether it would be faster to base a report on a single table, rather than on a dataset that uses two tables. I suspect that in this case any speed difference will be negligible (as CJ London says), but that in other cases there could be performance benefits from not being normalized: I seem to recall reading this somewhere?!
 
potentially small performance gains for many maintenance losses
 
the real problem with adding the staff name and staff id to the absences table, is what happens if the staff name changes - say a woman marries.

now you have to change the name in the staff table, but ALSO remember to change it anywhere else you have stored the full name.


The potential gain really comes in heavily computational processes. Say you want a report that shows monthly sales profit totals, and the profits are the result of complex queries that evaluate several different cost elements.

In this case it might well be worth storing the collapsed sales and profits in an additional table, especially as the old sales and profit values won't (or at lest shouldn't) change.
 
Thanks for your replies.
I understand the principles of normalisation, but my query was really about whether it would be faster to base a report on a single table, rather than on a dataset that uses two tables. I suspect that in this case any speed difference will be negligible (as CJ London says), but that in other cases there could be performance benefits from not being normalized: I seem to recall reading this somewhere?!
The case you describe (staff/sickness) rally does shout for being normalised. Normalisation suits transactional activity i.e. frequent updating of records and is supported by the points already raised.

The structure you describe is two tables with a simple join. Furthermore, what stuck me about the case in question is that you are unlikely to have a large number of records. You'd have to have a lot of staff and sickness to achieve even a modest number of records. If you are experiencing a performance hit their may be other reasons e.g. network, lookups, poor indexing.

Nevertheless, there is often the case for creating a "datamart" away from your transactional database. Sales data is a great example where you employ normalisation on the transactional side to ensure the integrity of the data being entered. But separately you extract the data to a datamart (aka data warehouse) where you run intensive reports/summaries/analysis etc. In datamart the data and table structure is typically not normalised but instead optimised for such analysis. The way you describe would be one such approach. The subject matter you mention her doesn't appear warrant such an approach but maybe you know different.

hth
 
OK, got it! I like the datamart idea - but as you say (stopher) probably not for this case.
Thanks folks!!
 

Users who are viewing this thread

Back
Top Bottom