Import + rework large text file, whats fastest ?

Zaz

New member
Local time
Today, 08:16
Joined
Dec 29, 2011
Messages
3
Hi, probably a common topic but search has not been very successful... hope to get some tips here.

I use Access 2003 (soon 2010) to import a .txt file with 8 million lines, they are short so its around 500 MB.

It has no delimiters, instead the first letter in each line tells me how it is formated and which information to expect, which means i have to do about 5 TRIM operations on each line to extract the "fields" (see example below).

I used to open the file with ADO, read each line and write the appropriate parts of the string into fields of a regular table, but this has become slow somehow, like 30+ minutes.

I also tried to read the entire file into a single-field table into Access and then do transition into the fields of a second table within Access, which turned out to be REALLY REALLY slow.



Example:

A NewYork 12345
B abc
C 1 aa
C 2 bb
C 3 cc
C 4 dd
A Sydney 123456
B abc
C 1 aa
C 2 bb
C 3 cc
C 4 dd
...etc

The data is dependant, the array operations for this might also be slow... ?? e.g. there can be 1 "A" line followed by 20 "B" lines with each "B" segment containing another 50 "C" lines... what i need in the end is a table like this:

New York; 12345; abc; 1; aa
New York; 12345; abc; 2; bb
New York; 12345; abc; 3; cc
New York; 12345; abc; 4; dd
....
etc


Any tips how to get the best speed for this kind of operation ?


Thanks for help,
Z
 
I do something similar on a much smaller file. It has ~30K lines extracted with 15 fields and has to test for and dump headers from each page. This process takes a minute. It requires a fair bit of data massage but going on that you are doing well with 8 Million in 30 minutes (over 4K lines per second).

I currently parse and write the lines to an ADO ForwardOnly recordset based on the destination table. While I can't do much about the time taken for the data manipulation I expect the performanace writing to the table could be improved.

I have just moved the table to SQL Server and I am planning to try a paramertised query for the reasons explained on this page.

However this test is somewhat acedemic since I am only faced with a minute to do the job anyway. Moreover I have converted the code to a compiled VB.NET program which will be run on a schedule overnight.
 
>>I used to open the file with ADO, read each line and write the appropriate parts of the string into fields of a regular table, but this has become slow somehow, like 30+ minutes.<<

Why has it become slow somehow; was it ever any faster?

Chris.
 
I would import the file into a temporary table, and then parse it out. Since there are 8 million rows, I'd argue for a move to SQL Server. This kind of recurring import cries out to be done in SQL Server Integration Services (SSIS) which was designed for importing large amounts of data in set-based ways rather than looping through recordsets, even ADO ones.
 
Thanks guys. SQL server is not an option, would be probably best.

ChrisO, i have narrowed the problem down to network performance, probably caused by my user profile (how unlikely is that ?).
Turns out my collegues can still run this in <10 minutes and so im sticking with reading the file line by line.

Cheers
Z
 

Users who are viewing this thread

Back
Top Bottom