wiklendt
i recommend chocolate
- Local time
- Today, 11:36
- Joined
- Mar 10, 2008
- Messages
- 1,746
Hey everyone.
Intro
I'm going to tackle an import procedure i've been avoiding for a while - mainly because i have no idea of where to start with this.
Please open the attached PDF (the files are excel, but i've printed to PDF for convenience of showing you guys) so you know what i'm talking about....
ok, so. we have a machine which determines antibiotic resistance of any bacterial isolate you throw into it, and for each sample, we can manually export its results in a single excel file per sample. We currently have a hundred or so of these files, but acquire about 3-4 new ones a week.
Information required to pull out
I've highlighted the fields in the PDF that i'd need to import. Noted also is that the header information isn't always the same number of rows down, and there's other data which is an unknown rows also.
In addition to that, the antibiotic names in the first column can change depending on the test performed on the machine.
Also, the values it provides as a result (e.g., "<=1") i will want to separate the, what i call, 'qualifier' from the value and wherever there is no qualifier, i want to add an equals to it (i.e., "8" becomes "=8") and then split to two fields)
Current competencies
i have several normalised tables in my database, and once the data is imported into it, i can match field data to an ID and append the appropriate FK, but i don't know how to start importing this stuff.
The unknown
i know an import specification is normally used, but it requires column and field names to be identical.... well, that's fine except where the data is all over the shop like in our excel files. not helped by the fact i've never done any import whatsoever (aside from a FE/BE scenario, which is totally different anyhow!).
so how would i go about this one? would i need to parse the info first into some other format and then import, or import and try to filter out redundant data, or some other way?
anyone willing to point me in the right direction? how would you do it?
ta.
Intro
I'm going to tackle an import procedure i've been avoiding for a while - mainly because i have no idea of where to start with this.
Please open the attached PDF (the files are excel, but i've printed to PDF for convenience of showing you guys) so you know what i'm talking about....
ok, so. we have a machine which determines antibiotic resistance of any bacterial isolate you throw into it, and for each sample, we can manually export its results in a single excel file per sample. We currently have a hundred or so of these files, but acquire about 3-4 new ones a week.
Information required to pull out
I've highlighted the fields in the PDF that i'd need to import. Noted also is that the header information isn't always the same number of rows down, and there's other data which is an unknown rows also.
In addition to that, the antibiotic names in the first column can change depending on the test performed on the machine.
Also, the values it provides as a result (e.g., "<=1") i will want to separate the, what i call, 'qualifier' from the value and wherever there is no qualifier, i want to add an equals to it (i.e., "8" becomes "=8") and then split to two fields)
Current competencies
i have several normalised tables in my database, and once the data is imported into it, i can match field data to an ID and append the appropriate FK, but i don't know how to start importing this stuff.
The unknown
i know an import specification is normally used, but it requires column and field names to be identical.... well, that's fine except where the data is all over the shop like in our excel files. not helped by the fact i've never done any import whatsoever (aside from a FE/BE scenario, which is totally different anyhow!).
so how would i go about this one? would i need to parse the info first into some other format and then import, or import and try to filter out redundant data, or some other way?
anyone willing to point me in the right direction? how would you do it?
ta.