Fornatian
Dim Person
- Local time
- Today, 04:05
- Joined
- Sep 1, 2000
- Messages
- 1,396
Right, I've just been to teach some Access to some eminent professors at a London Hospital who are assisting in the Genome project (whoa,whoa,whoa!!!!). Down to the nitty gritty anyway.
Text files are generated by an untappable outside source, creating results of experiments with three or four fields duplicated at each output. These files are big when imported and attach to another big file imported text file to make sense of the codings field.
At present these are imported as separate tables.
However, my normalisation knowledge suggests that they should be a imported into one table with an extra column denoting the experiment description code. Good so far?
Bar the size of the data, all is well and good.
The mental problem I am having is that the researchers want to compare the results sets from different experiments. OK, that seems easy enough I'll create a query using a self join and an expression under my new experiment field for each table to extract the desired result set for each experiment.
However, the text files do not always have matching field values because one experiment might observe one DNA tag whilst the others don't. This require a Left, Right then Union query to emulate a Full Outer join because the non-matches are as important as the matches if not more so.
Given the complexity of their analysis requirements I have reservations about my normalisation policy because further experiments will be added thus requiring multiple lefts,rights and outer joins to support their analysis. I will not always be on hand to support their SQL writing to ensure their record integrity is maintained.
In addition if I go for my proposal of integrating the data into one table what are others experiences of criteria led self-joining queries with 2,3,4 etc copies of the same table?
Seems very complicated but hey, so is our DNA though.
Any opinions offered?
Text files are generated by an untappable outside source, creating results of experiments with three or four fields duplicated at each output. These files are big when imported and attach to another big file imported text file to make sense of the codings field.
At present these are imported as separate tables.
However, my normalisation knowledge suggests that they should be a imported into one table with an extra column denoting the experiment description code. Good so far?
Bar the size of the data, all is well and good.
The mental problem I am having is that the researchers want to compare the results sets from different experiments. OK, that seems easy enough I'll create a query using a self join and an expression under my new experiment field for each table to extract the desired result set for each experiment.
However, the text files do not always have matching field values because one experiment might observe one DNA tag whilst the others don't. This require a Left, Right then Union query to emulate a Full Outer join because the non-matches are as important as the matches if not more so.
Given the complexity of their analysis requirements I have reservations about my normalisation policy because further experiments will be added thus requiring multiple lefts,rights and outer joins to support their analysis. I will not always be on hand to support their SQL writing to ensure their record integrity is maintained.
In addition if I go for my proposal of integrating the data into one table what are others experiences of criteria led self-joining queries with 2,3,4 etc copies of the same table?
Seems very complicated but hey, so is our DNA though.
Any opinions offered?