Non-uniqueness problem - Please comment on my ENERGY database design … (1 Viewer)

Alexsasha

New member
Local time
Today, 08:43
Joined
Nov 14, 2006
Messages
3
Dear All,

I am developing a world energy database. All elements in the structure seem logical but one. I had to introduce two artificial columns to distinguish between sources, from which data come (I try to keep all the data in the raw format). This is how my table looks like (total of 8 columns, * stands for key column):

[Year]* – [Country]* – [Product, eg oil]* – [Value (eg how much of oil)] – [Units (eg barrels)] – [Source (data can come from different sources)] – [Source number]*- [Yes/no (put yes to use data from a particular source)].

Take for example US oil consumption. If the US government reports that in 2003 consumption of oil in the US was 20033 thousands of barrels per day. In my table I make a record:

[2003] – [US] – [oil consumption] – [20033] – [000 barrels per day] – [US government] – [(for example) 1] – [Yes]

I introduced “source number” column to be able to distinguish between the same data points that come from different sources, e.g. US oil consumption could be reported by the US government and by the International Energy Agency differently. The “Yes/no” column is there to be able to easily recover only those data which I trust to be correct, so I put yes for oil consumption reported by the US government and no for IEA number.
The intuition tells me that these two columns could be a source of potential problems. Would be great to know if someone else had to overcome non-uniqueness problem in their database design and what are the best practices.

Many thanks
Alex
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 08:43
Joined
Sep 12, 2006
Messages
15,672
you don't really need the source number as such, as you already have the source itself. i suppose in strict terms there are a few normalization steps - egi - your table could be considered not properly normalised as you now have multiple values in your table for 2003 us oil. Perhaps you should ensure that only one of these can be allowed to take the yes value as correct.

to avoid the duplications within this table, you would need a separate table for all the readings for the base years stats, but you still end up with a similar problem in selecting the one you wish to use at any point.

you might also decide to store the information sources in a separate lookup table (similar to the way you were doing with your source number, but where eg US Govt, is always sourrce 1, whichever commodity.
 

Alexsasha

New member
Local time
Today, 08:43
Joined
Nov 14, 2006
Messages
3
Thanks for your reply,
I thought to use number in addition to the source column to avoid spelling mistakes in the key field.
I agree that it makes sense to have a unique number for every source (thanks for this!), however it is probably better to restrict it to a particular country, thus not overloading
the database with numerous sources in all countries.
The “Yes” would stand only for those records where the source is unique or in case there are more than one source, only for a single the most trusted one. Thus I hope it prevents replications and if I want to know the total oil consumption in North America eg, I just say – show me the records for oil consumption with “yes” flag.
What do you think?
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 08:43
Joined
Sep 12, 2006
Messages
15,672
yes, thats how it would work - i f it makes sense to you to have more than 1 "trusted source" then you can do so - i thought you might want to restrict your dbs to only one - so that if you were trying to get, say a world oil proudction total. you would only get 1 figure from each area/country.

don't worry about overloading the database - thats what databases do, and access will deal with thousands of records in an instant - far quicker than sorting a spreadsheet for instance
 

dsigner

Registered User.
Local time
Today, 08:43
Joined
Jun 9, 2006
Messages
68
I would look at what sort of data might appear in the standard report. Official figures are often subject to "convenience" so a good report will give some sort of range to a data point. E.G. max min, average, preferred single answer, number of sources. This may be much heavier than you were thinking of but if it is what you wanted then the table should be structured to make it straightforward.

Are you going to concentrate on data by country? This probably easiest but then you will have some reports by region. I think that you need a country table and probably a region table as well. The fun starts when regions overlap so that a country is either missed by two adjacent regions or is in both.

I think it worth while to think about these issues before you complete your design. Hope this helps
 

Alexsasha

New member
Local time
Today, 08:43
Joined
Nov 14, 2006
Messages
3
Guys,
Many thanks for you replies,
I have my data by country and region as well (I should have asked you for a help before I started doing the database ). Country and region are in the separate table (hopefully there should be no overlapping between the regions I selected, I used standard geographical convention). I also have a table for units of measurements and conversion rates and now will have a table for source with auto number, I think, thanks to gemma-the-husky.
Good point - that data might come in completely non-standard format, because of that I wanted to be able to have more than one source and will think about the ranged data, cheers.
Thanks
 

Users who are viewing this thread

Top Bottom