View Full Version : Newbie ? on Database Design and Import


PeggyT
03-18-2008, 11:20 AM
Hello everyone!

I am developing a database that's industry-specific and would like some thought on how best to design the database to also make importing a spreadsheet easy to perform.

The below example is not my exact database, which I'm afraid has to remain confidential; however, the example below - a database on movie production studios - mirrors my situation faithfully.


Table Structure - Field Name Captions (notes or row data)
tblStudio - Studio name, mailing address, website, year founded for each studio
tblMovie - Movie title, year release, production studio (look-up tblStudio), average score (taken from RottenTomatoes.com), genre (look-up tblGenre), format (look-up tblFormat)
tblFormat - Format (Beta, VHS, DVD, Blu-Ray, iTunes, 3D, etc.)
tblGenre - Genre (Action, thriller, horror, etc.)

Each table's primary key is an autonumber.

Originally, I had all the formats and genre as a multivalue field for each. And then I realized I could not import the information from an excel spreadsheet.

Genre will always be a single value, so I removed the multivalue field and copied over the genre data manually as a look-up column. I can now import genres as long as the genres match.

My problem is with importing format because one movie may be released on multiple formats. I know adding fields such as Format1, Format2, etc. goes against the rules of normalization, but it would make import easy. I could then use an expression to output the formats as "one" value in a report.

I have a one-to-many relationship from tblStudio to tblMovie. I use one form to input the studio information and that studio's movie information (a subform). My import's process has been a two step process: importing the studio first and then movie titles.

I'm looking for a solution which follows the rules of normalization, one that allows me to use my main form for input without any complications for adding format to a specific movie title, and one where I can import the movie titles and formats easily. Before I commit the normalization sin of Format1, Format2, what do you guys suggest? The solution's not obvious to me and as a newbie, I'm limited in my abilities to use scripts. Any help is sincerely appreciated!

-PeggyT

gemma-the-husky
03-19-2008, 02:38 AM
please dont denormalize your data

load the spreadsheet, and then either

a) run a series of queries to extract the data from format1, format2 etc

or

b) process it as a recordset to achieve the same result


having said that, this will only be an issue if you want to search by available formats (eg find me all blurays) if its just a bit of info, you could just concatenate the formats into a single string


-----
couple of caveats

please not that importing a spreadsheet is rarely straightforward. Often a spreadsheet is not normalized, and the data needs to be processed into multiple tables - sometimes the data will be new, and other times it will already exist, so you need to handle both these cases - or it may be an error in the spreadsheet

ie Warner Bros, Warner Brothers, Warners are not all new studios

Thats why relational databases are better than spreadsheets for managing data

------
also, i note you havent got actors, directors in there - is this by design, or because you are giving us a cutdown version - actors will give you the same problem as formats ie multiple columns. - and again, you also need to consider discrepancies in eg actor names

eg Robert De Niro, Robert Deniro, Rob Deniro, just De Niro etc

-----
finally with actors you may want to change Robert De Niro into De Niro, Robert for searching/sorting which wont be trivial, given some names have two given names, one surname, and others vice versa

georgedwilkinson
03-19-2008, 07:41 AM
The format does not belong in the table with the movie. You need a separate table with FK from movie/format that tracks potential instances of a movie.

tblMovieFormats: MovieID (FK), FormatID (FK), comments, ID, etc.

Also, the "book" way of naming tables is to pluralize the table name. When you are working at a conceptual/analysis level, you leave the entity (which is a potential table) names singular.

Also, I agree with Gemma that you need to work with the Excel data as a flat table and once it is in Access, write routines to normalize it into your tables. You do not need to remove your artificial keys.

HTH.

Thanks, George

PeggyT
03-19-2008, 07:46 PM
Gemma and George,

Thank you each for your help!

To Gemma:
Yes, based on my example, at some point my employer would like to be able to review data based on format as well as other criteria. For example, I would run reports on industry-wide trends: how many movies have been released on a particular format or are there differences in average critic ratings from one format to another (maybe critics don't like 3D!), how do the movie studios compare for number and quality of movies released, etc.

I have already worked on a process for importing the spreadsheet, and yes, I agree that it has to be broken up into separate tables as I'll explain below in my msg to George...

To George:
I actually do have my table names pluralized, I didn't realize the significance and therefore didn't type up my message that way. Yay, a coincidence that worked in my favor! There aren't many of those ;P

To make sure I understand you correctly and that I'm clear on the best process:

- In addition to tblStudios and tblGenres, I will have tblMovies, tblFormats, and then tblMovieFormats which will track movies and their respective formats with the MovieID (autonumber) and FormatID (autonumber) foreign keys.
- I am going to import one spreadsheet of studios into tblStudios
- I am going to import a second spreadsheet of movies into tblMovies that joins with tblStudios for the studio name and also pulls in genre from tblGenres
- Convert the formats in my spreadsheet to match their FormatID (no problem, there aren't that many)
- And lastly, use a query to join the FormatID and MovieID where the MovieName matches?

It's the last step in the process that I am stuck on! I am then left with a query that I can view in datasheet form, but it's not a table. So, if I delete the imported table that held the movie name and format, my query results are gone too, right?

And in keeping with your example, instead of simply having tblStudios which also has fields for address, year founded, website, for the particular studio, should I have broken these into two tables as in tblStudios and tblBasics (contains address, website, etc.)? Or is that not necessary since, unlike movies and formats, there's going to be one address, one year founded, etc. per studio?

Thank you again Gemma and George for your help!

Peggy

georgedwilkinson
03-20-2008, 08:22 AM
In addition to tblStudios and tblGenres, I will have tblMovies, tblFormats, and then tblMovieFormats which will track movies and their respective formats with the MovieID (autonumber) and FormatID (autonumber) foreign keys.


Not autonumber on the FKs, Long Integer (if you're using Access as your DB).



- I am going to import one spreadsheet of studios into tblStudios
- I am going to import a second spreadsheet of movies into tblMovies that joins with tblStudios for the studio name and also pulls in genre from tblGenres
- Convert the formats in my spreadsheet to match their FormatID (no problem, there aren't that many)


Not sure what you mean by this. In a pure world where this is a long-running (OLTP) real-time system, you should pull all the data out of your spreadsheets into the new tables you've developed and then delete the spreadsheet data. It really depends on your needs and you understand that better than I do.


It's the last step in the process that I am stuck on! I am then left with a query that I can view in datasheet form, but it's not a table. So, if I delete the imported table that held the movie name and format, my query results are gone too, right?


Yes. See my comment above. You want to extract the spreadsheet data into the tables you've designed to run this system.


And in keeping with your example, instead of simply having tblStudios which also has fields for address, year founded, website, for the particular studio, should I have broken these into two tables as in tblStudios and tblBasics (contains address, website, etc.)? Or is that not necessary since, unlike movies and formats, there's going to be one address, one year founded, etc. per studio?Peggy

Again, depends on the requirements of the system, but if this system will be running OLTP transactions and is intended to be around a long time, you should break things down to their atomic level. If it's studio information, that goes in a studio table, if it's address information, that goes in an address table. If it's people or party information, that goes into a people or party table. Then you link the tables together with foreign keys and, potentially, associative tables (for the many to many [aka m:m] relationships - like tblMovieFormats). Does that make sense?

PeggyT
03-27-2008, 06:04 PM
Hi Gemma and George,

I want to thank you for your help on the forum. I have been re-evaluating my database for the past few days as I reviewed your suggestions. I am happy to say I worked out a solution whereby I comply with the rules of normalization and did not need to make fields for Format1, Format2, etc, like I thought I would. I found out that my root problem was not understanding imports: I was trying to import a text field, like MovieName, when it was a look-up column to MovieID, an autonumbered primary key from another table. Learning to do a query and then an append query solved that issue and I am able to store a row of data for every MovieID and matching FormatID in a third table. Sorry if I am not being clear, but just know it was a long road to get here, and your posts help me solve my problem.

Peggy

georgedwilkinson
03-28-2008, 08:06 AM
It does take a long time to get this stuff right. I'm sure Gemma and I together have 40 years + experience doing this type of thing and sometimes it's hard to get everybody in the same paradigm so we understand each other.

Please feel free to ask any follow-up questions. I'm always happy to help.