Processing a very large recordset

ddubb

New member
Local time
Yesterday, 19:36
Joined
Feb 24, 2010
Messages
3
I have 4.2 Million records in a table that I need to process through 1 record at a time. I will only need to move forward through the records one time and perform calculations on and update each record. Using Access 2003 all tables are native Access.

Can I or should I process all records and then update in batch? What is the best method?
After reading other posts I assume I should use DAO instead of ADO.
Any other suggestions for quickest possible way to process this would be helpful.

A little more information... as this large table(table A) is processing I also write records to another table(table b) based on the results of the large tables(table a) records.
aproximately every 400 records I also have to pull some data from another table(table C) based on some parameters aquired from the large (table A).
 
Using DAO or ADO would depend on the situation, whether you've got an Access Project (.adp) or an Access Database (.mdb), pre Access 2007. DAO is recommended for mdb and ADO for projects.

Since you want to perform some operations whilst going through each record then a batch update isn't what you want. You just perform the update as you would with a "normal" recordset. You can use SQL UPDATE/INSERT statements but again, since you want to extract some info as you go through each record, this won't work for you.

So to sum up, use three recordsets. For tableA use a Forward Only record type enum, tableB use a dynaset enum, and for tableC use a snapshot.
 
TBH, this sounds a bit strange - to have to do something like this for 4m plus records

is this a one-time thing or a regular thing? If its a one-off it might be justified to do it in the way you suggest. But if its a regular thing, you want to be trying to improve/optimise it.

in any event, its probably worth looking at the process that makes this necessary.

I think that the need to write stuff to table b from time to time, could probably be done by an update or append query. The need to get data from table c indicates either the data is not normalised, or the table/query you are processing is not the correct one.


I would certainly take a small subset of the main data to practise on, while you are getting your algorithm sorted.
 
Thank you for the responses, I'll agree it is a bit strange... it is a .mdb file... the data is definitley normalized hence the need to pull information from table c for processing.
This is something that would be run periodically by a user in order to test the impact of changes made to Table C on Table A.
Right now I estimate the processing to take 30hrs? but I would like to get that down to 4 or less if possible. I was wondering what would be the most efficient way to do this.. I was thinking that if I .update after each record is processed that this would take more time then running completeley through the recordset then updating via some update batch method. I know ADO there is a method to UpdateBatch I think the same can be accomplished with DAO but not sure what to do. or if this is even wise?
 
the point i was trying to make, is that if you need to get info from table C, with a recordset, then it probably indicates the data is not normalised

again, you would normally be able to do this with update queries, which is almost certainly far quicker than iterating a recordset.

but you are talking in general terms only, and unless you clarify exactly what you are doing, we may not be able to help nmuch further.

but eg - if many of the 4m records are old - they probably arent so important - just process the recent ones, and leave the old stuff till later.
 
Think of it as a fresh model everytime it runs. I'm not trying to reduce the number of records processed at this point I'm just looking for guidance on the most efficient way to process them.
 
its impossible to know without understanding your data structure

but generally i would try and find an alternative to iterating a record set.

and its hard to see why you would need to change records in table a, because of something in table c - this, in itself does seem like a normalisation issue.
 

Users who are viewing this thread

Back
Top Bottom