better way to load questionaire data?

garethl

Registered User.
Local time
Today, 02:39
Joined
Jun 18, 2007
Messages
142
I have questionaire data which is loaded from excel files which are in the format of one column for each question and one respondents responses in each row. In the database the questionaire data is stored in a normalised format i.e. TBL_RESPONSE has the following fields

IDResponse , QuestionID, AnswerID, AnswerText, JobID, BatchID

BatchID is generated by the database and each excel file loaded gets a batch ID
JobID links to the table of Jobs as the questionaires are QA on building jobs

TBL_MonitoringQuestions has..

IDQuestion , QuestionCode, QuestionText, InChecker

QuestionCode is the actual heading of the column in the excel file
Inchecker is a boolean which flags whether the question is in the excel file (there are a couple of questions which are ignored by the database)

TBL_MonitoringAnswers has..

IDAnswer , Answer, Code

Again code is the actual text which is in the cell in the excel file when that answer has been given

TBL_MonitoringAllowedAnswers has..

IDQuestionAnswer , QuestionID, AnswerID

Hopefully that looks like a sensible format to store the data in. I decided on that after reading a lot of stuff about querstionaires on this site.

My question is on how to get the data from the excel style data in one question per column format to the database style linked table format.

Currently I am doing this. I have a query (well actually 2 queries) which lists all the valid question / answer combinations. I make a recordset of that query and then loop through the recordset loading all the yes's to question 1 then all the no's to question 1 then all the yes's to question 2 then all the no's to question 2 etc...

This method is really really slow! I'm sure there must be a better way that is staring me in the face. Please help.

Here is the first query I loop through the results of
SELECT TBL_MonitoringQuestions.QuestionCode, TBL_MonitoringQuestions.IDQuestion, TBL_MonitoringAnswers.IDAnswer, TBL_MonitoringAnswers.Code, TBL_MonitoringQuestions.InChecker
FROM TBL_MonitoringQuestions INNER JOIN (TBL_MonitoringAnswers INNER JOIN TBL_MonitoringAllowedAnswers ON TBL_MonitoringAnswers.IDAnswer = TBL_MonitoringAllowedAnswers.AnswerID) ON TBL_MonitoringQuestions.IDQuestion = TBL_MonitoringAllowedAnswers.QuestionID
WHERE (((TBL_MonitoringAnswers.Code) Is Not Null) AND ((TBL_MonitoringQuestions.InChecker)=True));

and here is the second which deals with the free text questions

SELECT TBL_MonitoringQuestions.QuestionCode, TBL_MonitoringQuestions.IDQuestion, TBL_MonitoringAnswers.IDAnswer, TBL_MonitoringAnswers.Code, TBL_MonitoringQuestions.InChecker
FROM TBL_MonitoringQuestions INNER JOIN (TBL_MonitoringAnswers INNER JOIN TBL_MonitoringAllowedAnswers ON TBL_MonitoringAnswers.IDAnswer = TBL_MonitoringAllowedAnswers.AnswerID) ON TBL_MonitoringQuestions.IDQuestion = TBL_MonitoringAllowedAnswers.QuestionID
WHERE (((TBL_MonitoringAnswers.Code) Is Null) AND ((TBL_MonitoringQuestions.InChecker)=True));

The free text ones being the ones for which no allowable answer codes are given

Below now is my code that for making and looping through the recordset including the vb the insert statement that keeps putting results into TBL_MonitoringResponse

Public Sub append_results()

Dim rst As ADODB.Recordset
Set rst = New ADODB.Recordset
rst.CursorLocation = adUseServer
rst.CursorType = adOpenKeyset
rst.LockType = adLockOptimistic
rst.Open "QRY_MonitoringAppendsToRun", CurrentProject.Connection

If rst.RecordCount > 0 Then
Do While Not rst.EOF
strQues = rst!QuestionCode
strID = rst!IDQuestion
strAnswer = rst!IDAnswer
strRead = rst!Code

strSQL = "INSERT INTO TBL_MonitoringResponse ( MeasureID, QuestionID, AnswerID, BatchID ) " & _
"SELECT QRY_MonitorChecked.IDMeasure AS MeasureID, " & strID & " AS QuestionID, " & strAnswer & " AS AnswerID, " & intBatchID & " AS BatchID " & _
"FROM QRY_MonitorChecked " & _
"WHERE (((QRY_MonitorChecked." & strQues & ")='" & strRead & "'));"
CurrentProject.Connection.Execute strSQL
rst.MoveNext
Loop
Else
MsgBox "Couldn't locate questions table", vbOKOnly, "Fatal Error in Load"
Exit Sub
End If

rst.Close

rst.Open "QRY_MonitoringAppendsToRun2", CurrentProject.Connection

If rst.RecordCount > 0 Then
Do While Not rst.EOF
strQues = rst!QuestionCode
strID = rst!IDQuestion
strAnswer = rst!IDAnswer

strSQL = "INSERT INTO TBL_MonitoringResponse ( MeasureID, QuestionID, AnswerID, BatchID, AnswerText ) " & _
"SELECT QRY_MonitorChecked.IDMeasure AS MeasureID, " & strID & " AS QuestionID, " & strAnswer & " AS AnswerID, " & intBatchID & " AS BatchID, QRY_MonitorChecked." & strQues & " AS AnswerText " & _
"FROM QRY_MonitorChecked;"
CurrentProject.Connection.Execute strSQL
rst.MoveNext
Loop
Else
MsgBox "Couldn't locate free questions table", vbOKOnly, "Fatal Error in Load"
Exit Sub
End If

rst.Close
MsgBox "Montitoring data has been loaded.", vbOKOnly, "Load Complete"
End Sub
rst.Close

rst.Open "QRY_MonitoringAppendsToRun2", CurrentProject.Connection

If rst.RecordCount > 0 Then
Do While Not rst.EOF
strQues = rst!QuestionCode
strID = rst!IDQuestion
strAnswer = rst!IDAnswer

strSQL = "INSERT INTO TBL_MonitoringResponse ( MeasureID, QuestionID, AnswerID, BatchID, AnswerText ) " & _
"SELECT QRY_MonitorChecked.IDMeasure AS MeasureID, " & strID & " AS QuestionID, " & strAnswer & " AS AnswerID, " & intBatchID & " AS BatchID, QRY_MonitorChecked." & strQues & " AS AnswerText " & _
"FROM QRY_MonitorChecked;"
CurrentProject.Connection.Execute strSQL
rst.MoveNext
Loop
Else
MsgBox "Couldn't locate free questions table", vbOKOnly, "Fatal Error in Load"
Exit Sub
End If

rst.Close
MsgBox "Montitoring data has been loaded.", vbOKOnly, "Load Complete"
End Sub
 
You could create a select query for each question and then union all the selects together. I've often used this to normalise data. Bit of a pain to set up espcially if there are lots of questions but may be more efficient.
 
Do you mean a select for every valid question / answer combination?

I guess you do and yeah that makes sense thanks.. so basically I was on the right track and you do have to write individual select / appends (even if you are linking them together in a union)

so basically its just a pig to take the unnormalised data and normalise it whichever way you look at it!

Thanks though, I'll try the union approach and hope it improves performance - I actually think there is a good chance that it will as it means i can lose the recordset. This is running on citrix and one of the things that I'm pretty sure makes the db grind to a halt on citrix is using recordsets.

I read that on this forum as well - that access is known to be painfully slow over citrix network and two of the things that are particularly bad are recordsets and aggregate functions.

Don't know if there are any citrix gurus out there that can shed any light on this?
 
Another thing is i did have the batch ID stored in a variable intBatchID so in that loop I was putting its value into the query string that i was building in vb

using this union query approach there are as i can see two ways that i can deal with it

1) what I have started doing which is to pass the public variable intBatchId to each individual query using a public function... so each query contains "getIntBatchID() AS BatchID"

2) set BatchID to zero in each query.. so each individual query contains "0 AS intBatchID" and then afterwards do

strSQL = "UPDATE TBL_MonitoringResponse SET BatchID = " & intBatchID & " WHERE BatchID = 0;"

So far I've written 27 of the 103 indvidual queries I'll need to link together in this UNION !! then I stopped being daft and wrote a perl script to generate the text of the SQL in a loop like I was doing in my VB procedure.

Anyway pretty much ready to see if this improves the speed when I'm back in tommorow morning but now I'm wondering... Am I needlessly slowing this down by looking up a variable through a function like that in EVERY single select. Surely I'd be better setting BatchID to zero and then running one single UPDATE query afterwards?
 
ok i decided i answered my own question on the function passing public variable thing it stands to reason that it should be slower

now i've tried nieleg's solution with the first 27 of those queries to see what the speed was like and i get the "cannot open any more databases" error which is discussed on this board at the moment in the thread called "how many open databases"
 
Oh dear :(

As to the variable, you really only need it once as you can set a local variable to that value. But if you've run out of object IDs then it's a bit academic.
 
i know! sods law

i read that other thread and it makes perfect sense whats happening, tho there aren't even many joins in that query which i'm addressing

i thought of another idea though

how easy is it to have stuff process in the background in access when the systems idle and noone is doing anything?

what about if i have the load routine dump the data quickly into a horrible unnormalised table. Then the in the background when everyones gone home or whatever the system checks if there are records in that table and sets to loading them into my nice normalised questionaire structure while noone is looking! deleting them from the temp table as it goes.

all the display forms for monitoring data could work off both tables as long as both tables include the jobs primary key

then if i do need to do any complex querying it works off my nice normalised tables and we just have to live with it excluding those records that haven't been 'fully loaded'

best of both worlds? or stupid idea? :)
 
I did wonder about that. You can use a series of append queries to add each question to your normalised table. That way you're only running one query at a time and the object ID issue won't arise.
 
Ok if you don't thik thats a step backwards I'll give it a go. The consensus on this board seems to be that using task scheduler and a macro is the simplest way to automate tasks so I'll do that and have it normalise the data thats been loaded every night after everyones gone home.

When I think of it like that though I wonder if theres any point in having a user press a button at all... I could surely just automate the database to scan the system for new files of monitoring results and load them automatically.

Anyway I'll just see how this pans out first.
 
Ok well that solution actually works really well

There is now a table in the back end database with the most horrific design you have ever seen.. well actually it just ISN'T designed at all. It has exactly the same structure as the excel file with the addition of a primary key and a boolean HasBeenLoaded field. No normalisation whatsoever, with the consequence that every single record has half the fields as null (either one half or the other there are two types of jobs in this jobs table).

The excel loading routine dumps data straight into this table and works tolerably fast. This is what the user sees and they are happy.

Then the routine which sorts this data into structured tables is kicked off by task scheduler. These are what I run my queries on and I am happy.

The mad thing is that this routine - which is essentially the same code as the original load routine works much faster. I can actually dump the data into my normalisation nightmare table, then pull it out again into my structured tables faster than i could just put it straight into the structured tables in the first place!

So i'm adding an extra step but its still faster. And I don't actually have to wait for the night at all I can even make it part of the load routine. This is bizarre.

When I say essentially the same code it is the exact same code except that the source of the SELECT is now a query selecting those records in the excel style table which do not have HasBeenLoaded set to True.

Once the data has been loaded an UPDATE sets those booleans to True so the records don't get appended twice.

Why should it work so much faster from a linked table than a temporary table which is part of the front end??
 

Users who are viewing this thread

Back
Top Bottom