Help me plot the Genome!

Fornatian · May 16, 2003

Right, I've just been to teach some Access to some eminent professors at a London Hospital who are assisting in the Genome project (whoa,whoa,whoa!!!!). Down to the nitty gritty anyway.

Text files are generated by an untappable outside source, creating results of experiments with three or four fields duplicated at each output. These files are big when imported and attach to another big file imported text file to make sense of the codings field.

At present these are imported as separate tables.

However, my normalisation knowledge suggests that they should be a imported into one table with an extra column denoting the experiment description code. Good so far?

Bar the size of the data, all is well and good.

The mental problem I am having is that the researchers want to compare the results sets from different experiments. OK, that seems easy enough I'll create a query using a self join and an expression under my new experiment field for each table to extract the desired result set for each experiment.

However, the text files do not always have matching field values because one experiment might observe one DNA tag whilst the others don't. This require a Left, Right then Union query to emulate a Full Outer join because the non-matches are as important as the matches if not more so.

Given the complexity of their analysis requirements I have reservations about my normalisation policy because further experiments will be added thus requiring multiple lefts,rights and outer joins to support their analysis. I will not always be on hand to support their SQL writing to ensure their record integrity is maintained.

In addition if I go for my proposal of integrating the data into one table what are others experiences of criteria led self-joining queries with 2,3,4 etc copies of the same table?

Seems very complicated but hey, so is our DNA though.

Any opinions offered?

pono1 · May 21, 2003

Ian,

All I can do, really, is move this one to the top of the list because I haven't got a real solution or good advice, just a general observation.

There are scary phrases in your post:

. eminent professors (no doubt each a lion with his own pride)

. text files are generated by an untappable outside source (deep, hidden secrets in those labs)

. three or four fields duplicated at each output

. I will not always be on hand to support their SQL writing to ensure their record integrity.

It sounds like you are left to make sense, through brute force, of a large collection of chaos... This is a rhetorical question (assuming all of those phrases above are air-tight): Is there no way you can head them off at the pass? That is, can you develop some sort of standard app for them to use up front to enter their data?

Last: Scary or not, it does sound like a fun project -- assuming it's billable.

Regards,
Tim

Fornatian · May 21, 2003

Cheers Tim,

Unfotunately none of the data is entered by the users, it is imported as text files from a internet site resource therefore a front end would not be sufficient successful. You are spot on with your comments re making sense of chaos. I'm not even sure they know what they want to do with it. However,because new data streams must be added I think I am going to get them to do some work re defining the purpose of the database and its specific inputs and outputs before delving further.

Maybe I'd of got more responses if I'd titled it:

"Help Me Pot The Gnome"

Vassago · May 27, 2003

Do they EVER know exactly what they want? I have yet to have someone tell me exactly what they want the database to do and look like before I build it for them. It's always after.

Fornatian · May 28, 2003

Interpretation is half the job

Help me plot the Genome!

Fornatian

Dim Person

pono1

Registered User.

Fornatian

Dim Person

Vassago

Former Staff Turned AWF Retiree

Fornatian

Dim Person

Similar threads

Users who are viewing this thread