Optimization Question: Arrays, collections, tempTables: which is faster?

PaulSchrum

Registered User.
Local time
Today, 12:44
Joined
Jul 30, 2011
Messages
20
All,

I have a large database in which I am currently writing and DELETE *'ing a table using it as a temporary table. The code is too complex to post here, but here is a summary of benchmarks:

#ofRecords RecordProcessedPerSecond
1,000 219 rec/sec
10,000 137 rec/sec
100,000 22 rec/sec

Well the real database has over two million records, so it is really crawling after 100,000 records.

I was thinking, though, that since I do not have to do any queries or sorting on the temp table, that it might run faster if I switch from storing that temp data in a table to storing it either a collection or a dynamic array. (It's nature is that it actually gets sorted the way I need it as I am building it from other data.)

Does anyone out there know which of the three types of temporary structured data stores is most efficient timewise? Any other tidbits of advice would also be appreciated.

- Paul Schrum
North Carolina, USA
 
What are you actually doing in the temporary table when processing it?
 
What are you actually doing in the temporary table when processing it?

Sorry, but I can be a little dense sometimes, so I am not sure I get the point of your question.

I parse data from an existing table, write it in a different form to my temp table, then I loop through it from rst.MoveLast to rst.BOF, while I simultaneously loop through a dynamic array.

Hopefully you can see from my response how I am not getting your question. Can you clarify it?

Thanks.
 
Just trying to establish what part of the process causes the per record slowdown with the increasing recordcount. What are you writing to?
 
Just trying to establish what part of the process causes the per record slowdown with the increasing recordcount. What are you writing to?

I am writing data back to the original 2 million record table (linked). In part, each record looks like this:

CustomerID Text(10) (PK)
Year LongInt (PK)
JanuaryCubicFeet LongInt // read this field
JanuaryDaysSinceLastReading // read this field
FebruaryCubicFeet LongInt // read this field
FebruaryDaysSinceLastReading // read this field
.
.
. // all the way to December
JanAveCubicFeet Double // write this field
FebAveCubicFeet Double // write this field
.
.
. // to December
 
That's not very good table design. My bet is that a temp table solution looks attractive because the data is structured so poorly in the main table.

'January' is data. It should not appear in the name of a field. It should appear as data in a field called month. Is 'DaysSinceLastReading' a reading of the 'CubicFeet'? If so, this table should look like this ...
tStuffVolumes
CustomerID
Date 'when the reading occured.
Volume 'the amount read.
That's it. And typically if you need an average, calculate it as required.
Mark
 
What aspect of the task causes you to write the temporary table and process through recordsets. Queries are invariably faster.

You are also denornalizing the data by writing the average.

That table structure is not usual either. More typically each measurement would be a separate record recorded against a date.
 
That's not very good table design. My bet is that a temp table solution looks attractive because the data is structured so poorly in the main table.

Yea, that's what I thought when I found out I was going to marry it. But unless this structure is the cause of my speed drag-down and the only way I can fix that is by changing the table format, requesting a change from the client is a non-starter.

In other words, I did not come up with it, but I have to live with it.

But, thank you for the suggestion.

- Paul
 
can you confirm the process again.

you are extracting certain data from a table nito another table?
then using the second table to delete the data from the first?

if so why bother using the second table. just delete in the first pass

as others have said, presumeably you are having to examine data from several columns to decide whether to process the record or not. so you have to step through your table in code, rather than by using a standard access query, and this is caused by faulty table design

the problem is that once you have got so far (too far) down the line, it may become a big job to rework all the functionality in your database - eg i expect this iterative process is repeated in other areas of your database?
 
Since you say you can't change the table design, we'd need to know more about what your code is doing in general steps.

You already say you are reading from original table into temp then write back to original but you didn't give an overview of what you did with temp - did you do any calculations or something? Does the temp table have one row matching the row in original or is temp table actually a summary of calculations derived from more than one rows in the original?

You also say you're using dynamic array. Are you then, doing something like:
Code:
For i = ... to ...
   ReDim Preserve myArray(i)
   myArray(i) = ...
Next

If so, then that's one reason why it takes so much resources. Array are not very good in bulk and worse of all in terms of dynamic allocations. It may be better to use either another temp table (aka recordset - if you are not opposed to ADO, you can create one in memory without a temp table) or maybe collections (though I tend to look at collections as more appropriate for holding objects rather than a bunch of variables).
 
Array are not very good in bulk and worse of all in terms of dynamic allocations. It may be better to use either another temp table (aka recordset - if you are not opposed to ADO, you can create one in memory without a temp table) or maybe collections (though I tend to look at collections as more appropriate for holding objects rather than a bunch of variables).

Given the similarity of the structure of a Collection and a single dimensional array (both use an index and value) I would expect that the collection would be pretty slow too. Morover the collection actually stores a name for the index so it is probably even slower.
 
can you confirm the process again.

you are extracting certain data from a table nito another table?

Yes. But the other table is a temp table, which I am using only as a scratch pad for computations. Alternately I could use a dynamic array or a collection for the scratch pad.

Because it is a temp table, I write it, use it, then delete all of the rows again, repeat (400,000 times).

then using the second table to delete the data from the first?
?

No. I am using the second table to write more data back to the first.

as others have said, presumeably you are having to examine data from several columns to decide whether to process the record or not. so you have to step through your table in code, rather than by using a standard access query, and this is caused by faulty table design

Yes, this characterizes the situation accurately. I would be okay with this if the time to process the 400,000 customers (5 records per customer, 1 record per year/customer) did not increase with the number of records processed. A 40-hour run on a PC is not tenable.

the problem is that once you have got so far (too far) down the line, it may become a big job to rework all the functionality in your database
I would say that this is not really the main problem right now. The main problem is the table takes too long to process.

- eg i expect this iterative process is repeated in other areas of your database?
I do not have responsibility or access to other areas of the database. I am supposing that someone is having to deal with it. But for me, I only have to focus on this one issue.
 
You already say you are reading from original table into temp then write back to original but you didn't give an overview of what you did with temp - did you do any calculations or something? Does the temp table have one row matching the row in original or is temp table actually a summary of calculations derived from more than one rows in the original?

You also say you're using dynamic array. Are you then, doing something like:
Code:
For i = ... to ...
   ReDim Preserve myArray(i)
   myArray(i) = ...
Next
Yes, that is close to what I am doing. It is actually this:

Code:
do while not rst_customerHistory.EOF
   startMonth = getMonthNumSince1900fromString("JAN",  rst_customerHistory("Year"))
   endMonth = (rst_customerHistory.RecordCount * 12) + startMonth - 1
   Dim ad_aveGallonsPerDay() As Double
   ReDim ad_aveGallonsPerDay(startMonth To endMonth)
   ' process
   rst_customerHistory.Next
loop

If so, then that's one reason why it takes so much resources. Array are not very good in bulk and worse of all in terms of dynamic allocations. It may be better to use either another temp table (aka recordset - if you are not opposed to ADO, you can create one in memory without a temp table) or maybe collections (though I tend to look at collections as more appropriate for holding objects rather than a bunch of variables).

THERE. That is most helpful. I understand how I would implement a dynarray in C++, and it would not bloat (so I suppose ;)), but I don't know how VBA does it behind the scenes. So this comment is information I can use. I would use it by switching from a dynamic array to a Collection.

Actually I have been put on hold on this project for now. See my next post.

Thank you very much for your response.

- Paul
 
All,

I have been put on hold on this project for the moment. I am still interested in seeing people's responses. But when I get released from this hold I may have to take (may get to take) an entirely different approach.

- Paul
 
do while not rst_customerHistory.EOF
startMonth = getMonthNumSince1900fromString("JAN", rst_customerHistory("Year"))
endMonth = (rst_customerHistory.RecordCount * 12) + startMonth - 1
Dim ad_aveGallonsPerDay() As Double ReDim ad_aveGallonsPerDay(startMonth To endMonth)
' process
rst_customerHistory.Next
loop

out of interest,

1) you don't want the dim statement INSIDE the loop. you only need to dim the variable once. i am not sure what happes when you repeatedly dim the same variable.

2) if you are not preserving data when you redim the array, is it not clearing every time.

the logic in this loop looks a bit off to me, to be honest.


overall, it looks like what you are trying to do is turn a non-normalised table into a normalized array. I would not do this. access really needs data in tables, for general functionality. I would also not use a collection. I think a collection is something that seems to try and implement a linked list, which is hard to do in vba, because there is no pointer data type. collections seemd very cumbersome and inflexible when I tried them

What I would do is run a series of 12 queries to put the monthly data into a normalised table

effectively

Code:
clear normalised datatable
for x= 1 to 12
   insert records for month x, from original table into normalised datatable
   which you can do with a bulk query. I am not into writing SQL as aircode, but it's trivial and can be done with a single
   stored query, or 12 different queries, if you find it easier that way.
   just add the account number (or whatever is the ID field), the date/month, and the single month's data in each bass
next

job done

now you have a normalised table that you can use in the rest of your dbs. you can join it back to the original table using the account number to pick up any other data in there.

-----
so whch is faster. in general I would ALWAYS try and use a bulk query. Access/SQL manages very complex logic without propblems. occasionally with very large tables, I have found some queries too slow, and had to write VBA to do the job manually.

I use arrays very sparingly though, and I never used a collection. I think data belongs in tables.



hope this helps
 
Last edited:
Given the similarity of the structure of a Collection and a single dimensional array (both use an index and value) I would expect that the collection would be pretty slow too. Morover the collection actually stores a name for the index so it is probably even slower.

I actually doubt it's an array but more like a linked list. Else, it wouldn't be capable of dynamic allocation. I don't know how they've implemented the search by key and whether the key points directly to the node or one has to walk the chain. As I said before, though, this really works better with objects because it's typically the objects that we want to do For..Each & do something with their properties. Simple variables may possibly be better off in an oversized array (to avoid having to ReDim Preserve at each single addition which is an expensive operation).

Yes, that is close to what I am doing. It is actually this:

I don't see any dynamic arrays? Only recordsets?

THERE. That is most helpful. I understand how I would implement a dynarray in C++, and it would not bloat (so I suppose ;)), but I don't know how VBA does it behind the scenes. So this comment is information I can use. I would use it by switching from a dynamic array to a Collection.

But as I said earlier, collection are really good for objects. Not so much for a bunch of simple variables generally speaking. Using a recordset is probably better because at least you can get an index created to optimize your access, apply filter and/or sort to the recordset. You can't do that with either arrays or collections (at least not without writing your own code and I don't think it's usually worth it).

you don't want the dim statement INSIDE the loop. you only need to dim the variable once. i am not sure what happes when you repeatedly dim the same variable.

The placement of Dim statement within a procedure is actually largely irrelevant, except in case where you're referencing a variable prior to its' Dim statement (which causes a compile error when Option Explicit is enabled). All variables within the procedures are immediately initialized as soon as you enter the procedure.

I agree, though, the Dim statement does not belong there for sake of readability & consistency.

in general I would ALWAYS try and use a bulk query. Access/SQL manages very complex logic without propblems. occasionally with very large tables, I have found some queries too slow, and had to write VBA to do the job manually.

I use arrays very sparingly though, and I never used a collection. I think data belongs in tables.

I agree in general. Table/Queries (and by extension, recordsets) are probably the best place for data manipulation.
 
out of interest,
1) you don't want the dim statement INSIDE the loop. you only need to dim the variable once. i am not sure what happes when you repeatedly dim the same variable.

I was under the impression that this was innoucuous and was only executed once, and that before any part of the function was ever executed. This is easy to fix.

2) if you are not preserving data when you redim the array, is it not clearing every time.
the logic in this loop looks a bit off to me, to be honest.
That is how I understand it. The logic is off, but it is because I made a mistake when I was trimming my real code down to the code quote. I really do want to clear the data every time I redim, and that part is working okay in the real code.

overall, it looks like what you are trying to do is turn a non-normalised table into a normalized array. I would not do this. access really needs data in tables, for general functionality. I would also not use a collection. I think a collection is something that seems to try and implement a linked list, which is hard to do in vba, because there is no pointer data type. collections seemd very cumbersome and inflexible when I tried them
On point. Advice taken.
What I would do is run a series of 12 queries to put the monthly data into a normalised table

. . .

now you have a normalised table that you can use in the rest of your dbs. you can join it back to the original table using the account number to pick up any other data in there.
Excellent idea. This is how I should have been doing it all along. Thanks a lot.
so whch is faster. in general I would ALWAYS try and use a bulk query.
. . .
hope this helps

Yes, it helps a lot. Thanks Dave. And sorry for calling you by your dog's name earlier.

- Paul
 

Users who are viewing this thread

Back
Top Bottom