Optimising memory use and disk access to improve speed (1 Viewer)

DanR

Registered User.
Local time
Tomorrow, 05:21
Joined
Mar 14, 2002
Messages
54
Hello

This is a rather open ended question, looking for your expertise on how to make best use of memory and disk access.

I have an access database running some very large processing jobs, for example, importing about 12 million records from a flat database to a relational system using vba script.

Processes take anywhere from hours to weeks to run on a desktop PC. To speed things, these large processes run with the back end sitting on the computer's C: drive, rather than the server.

While the process is running, the PCs processor generally (but not always) uses over 90% of its capacity for Access, but uses very little memory (about 35Mb of 512Mb available).

So two questions:
1. Is there a setting that can be changed to make Access load more records into memory at a time, and thus use more memory, but access the hard disk less frequently?

2. Could doing this significantly increase processing speed in Access.

Are there any other generic steps to speed performance, that I am missing???

Steps I have already taken include carefully setting all field data types to the smallest approporiate for the data stored.

Thanks for your thoughts

Dan
 

ChrisO

Registered User.
Local time
Tomorrow, 05:21
Joined
Apr 30, 2003
Messages
3,202
Hi Dan

This could turn into an interesting thread.

From my point of view, most of the time I have tried to optimise code performance it has produced somewhat disappointing results. Seldom would the overall performance be a 2:1 reduction in time. (But I guess that if it can take weeks then one week would certainly be better than two.)

The following is not in any particular order because it all needs testing.

As for loading more records into memory I don't know. It may also depend on how much disk activity is used for accessing the records and how much, if any, is caused by the use of virtual memory. In theory, disk access time for virtual memory can be reduced by having two hard drives and placing the swap file on the other disk. This way it cuts down on the seek time of the swap file by not requiring the heads to jump all over the place.

It certainly would be worthwhile investigating the difference in ADO and DAO timing.

Correct choice of variables could also effect timing but not always the way we think. Generally speaking, variants would be slower than the specific data type they represent. But apparently not always as Jet seems to want variants and needs to type cast anything to that data type. (I can't prove that but something along that line is in the link below.)

When calling a procedure passing arguments by ByRef should never be slower that passing ByVal.
(I used that 'never' word so I guess someone will prove me wrong, and so they should)

Increase the size of the source code. If done correctly more source code can prevent other code from running. In other words, try to unroll loops, it gets rid of the counter time, and don't run code that is not required. The code that is not run is the fastest of the lot.

If the result of a calculation is a foregone conclusion, consider stuffing the results in a table and copy to an array at run-time then access the array. See previous because it can get rather large.

Attempt to reduce multiple condition If statements.

Code:
    If This And That Then
        [color=green]' Blah[/color]
    End If
    
    becomes
    
    If This Then
        If That Then
            [color=green]' Blah[/color]
        End If
    End If
because in the first case both conditions are evaluated even if the This is False.
The second looks almost naïve… but is faster.

Spend most of your time optimising inside out.
Start on the inner most loop and work your way out because the inner loop is executed more often.

Just some random thoughts, but it all needs testing. Here are some more.

http://www.microsoft.com/OfficeDev/Articles/movs101.htm

Oh! I forgot to add. If the budget permits, buy a faster computer.

Regards
Chris
 
Last edited:

jeremie_ingram

Registered User.
Local time
Today, 14:21
Joined
Jan 30, 2003
Messages
437
One thing to consider would be the code you are using in the process. You could try seeking some refinement ideas from the great minds you have at your disposal here in the forum. I had an issue with pulling records from a flat file, and making the proper adjustments to them as they were put into the database. I worked hard on the issue, but the best time I could get the 80,000+ records to process was about 8hrs. When I check into the forum here, some great advice along with code was given to me and the process takes about 15 - 30 seconds.
Needless to say, I was ecstatic about the difference in time. It could very well be the code itself that is slowing the process down.
Look here http://www.access-programmers.co.uk/forums/showthread.php?s=&threadid=43336 for that post
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 15:21
Joined
Feb 19, 2002
Messages
43,560
You say you are using VBA Script. I don't believe that such an animal exists. There is VB, VBA, and VB Script. I'll assume that since you are posting here, you are using VBA.

12 million rows is a lot to add to a table all at once. Since you are using code rather than an append query, you should enclose the addnew code within a transaction. Then commit the transaction every 10,000 records. This will minimize the memory management problem of dealing with 12 million rows.

If you are using DLookup()s or other domain functions, find another way.
 

DanR

Registered User.
Local time
Tomorrow, 05:21
Joined
Mar 14, 2002
Messages
54
Thanks everyone

Yes Pat. I am using VBA. I used the word script loosely (and wrongly, I guess) to refer to programming code in general.

And in relation to all the suggestions:

a) Thanks heaps Pat for the transaction / commit suggestion. I'll give that a try. In fact, the program already processes records in groups based on a couple of fields (latitude, longitude). Each group contains anything from 10 to 30,000 records, so it would be easy to commit the transaction at the end of each group. It should also deal with risks from data corruption where the process is stopped mid way.

b) faster computer is on order. It will triple to processor speed (866MHz to 2.8GHz). It will be interesting to see how much this reduces the actual processing time.

And Jeremie, the article on various speed improving techniques is great!

cheers
Dan
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 15:21
Joined
Feb 19, 2002
Messages
43,560
If you want to provide for restart, you have a lot more coding. You need to write a record to a file for each checkpoint. The record needs to contain a checkpoint id and the unique identifier of the last row committed. If you don't have a unique id in your input file, you can use a record counter and write the record number. You need to have a start up option that specifies whether you are starting from the beginning or from a particular checkpoint. If you are starting from a checkpoint, your code needs to read the checkpoint file and use the information to position the input file by reading sequentially until you get to the last checkpointed record.

Seriously though, when you are dealing with this much data, you need to be really careful with your code. EVERYTHING counts. I had a file of 110,000 rows that needed some code conversion. With ONE DLookup(), the process took 40 minutes. When I removed the DLookup() and used a query with a join to the lookup table, the process took less than 3 minutes.
 

DanR

Registered User.
Local time
Tomorrow, 05:21
Joined
Mar 14, 2002
Messages
54
Good.

Sounds like I'm on the right track (more or less).

The data I'm dealing with comes from numerous separate datasets which are being combined into a far more efficient relational storage format. I'm hoping that once all the data is in the format I've designed, the large scale processing nightmare will be over, and over 6Gb of input data will 'boil down' to less than 500Mb.

And because the data is spatial, the program selects all records for a 1 degree latitude x 1 degree longitude 'block' to work with.

SO, I already have an index table which tracks each block for each dataset, when it was imported, and how many records of input and output there were. This makes it fairly easy / reliable to know where the process was up to if/when it stops.

It also makes it possible to redo just a small part of the process to deal with new or updated source data.

And happy to say there are no DLookups...

thanks again
Dan
 

ChrisO

Registered User.
Local time
Tomorrow, 05:21
Joined
Apr 30, 2003
Messages
3,202
Dan

One other thing that would certainly decrease processing time and may be very interesting.

Post sample data at the start and update the code as it develops. That way you could reduce your development time and avail yourself of all the people on this site that might be able to tweak specific parts of the process. Not me, but some others are very good at SQL for instance. Another instance is that someone may spot a way to employ an API to by-pass VBA altogether.

One never really knows and it could make a very interesting read.

The sample data wouldn't have to be real… just indicative.

Another thing I just thought of, while writing this rubbish, is that if you have a new computer on the way, how about restricting it to that sole purpose? No internet or virus checking…that sort of thing.

Good luck,
Chris
 

Users who are viewing this thread

Top Bottom