Importing large text file

sumdumgai · Oct 6, 2022

I have a very large text file (644MB) that I need to import into a table. At some point in the process, I get an 'Out of Memory' condition. I'm using the '.Readline' command, placing each line read into an array, which is the used to as import data. The process works for text files that are 388MB.

Any help or suggestions would be appreciated.

Thanks.

theDBguy · Oct 6, 2022

Are you able to import it manually? Just wondering if Access can handle it (using the Wizard).

Pat Hartman · Oct 6, 2022

I get an 'Out of Memory' condition.

Not sure what you would expect if you are loading the HUGE file into an array. Why are you not using a process that loads record by record into a table?

CJ_London · Oct 7, 2022

agree with Pat - why not just use transfertext?

if you need to modify values, use transfertext to link to the text file, then an append query

will be a lot faster as well

sumdumgai · Oct 7, 2022

theDBguy said:
Are you able to import it manually? Just wondering if Access can handle it (using the Wizard).

Yes, I can import using Wizard.

sumdumgai · Oct 7, 2022

CJ_London said:
agree with Pat - why not just use transfertext?

if you need to modify values, use transfertext to link to the text file, then an append query

will be a lot faster as well

The data is appended to existing data, and importing same data twice is not allowed. Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA.

CJ_London · Oct 7, 2022

The data is appended to existing data, and importing same data twice is not allowed.

no problem - use a left join query to identify the new records e.g.

Code:

SELECT source.*
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null

Some records and/or fields are ignored,

no problem - ignore them in your append query e.g.

Code:

INSERT INTO destination (fld1, fld2, fld4)
SELECT source.fld1, source.fld2, source.fld4
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null

others are modified, line by line

shouldn't be a problem, depends on what the modifications are e.g.

Code:

INSERT INTO destination (fld1, fld2, fld4, fld6)
SELECT source.fld1, source.fld2, source.fld4 & " " & source.fld5, source.fld6*1.25
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null

even if you don't go the sql route, you can create a recordset based on the linked table created by transfertext and loop through that rather than reading into an array then from the array.

Since your array method fails due to memory issues, you either need to invest in more memory, or you need to develop a different method.

If you need help on that then you will need to explain what the modifications are

sumdumgai · Oct 7, 2022

Thanks to all for your suggestions. The code to process these text files was developed long ago and I'd rather not reinvent the wheel. Right now, I'm trying to read the text file into the array in chunks, using a max line count safety value (e.g. 100,000) to limit memory use, and starting line numbers in the text file, so pass 1 gets line 1 to 100,000, pass 2 gets line 100,001 to 200,000, ... until all lines are read.

The_Doc_Man · Oct 7, 2022

The problem I see here is that using ReadLine to load an array, you are reading one line at a time anyway. Then you are processing one line at a time from that array, based on this comment: "Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA."

Why use the array at all? It seems as though it would buy you very little and it will cost you a bunch, because you now need TWO huge chunks in memory at once, once for the array that contains a bunch of strings and once for what gets loaded to the table (which ALSO is a bunch of strings), because the way Access stores things is to have pointers to records that are stored.

Because of virtual memory issues, even with split front-end and back-end, you cannot have more than 2 GB for the FE and 2 GB for the BE - but if you have a non-split DB you STILL won't get a full 4 GB of virtual memory. Even if your system has at least 8 GB of RAM, you cannot exceed 4 GB of virtual space because internally, Access (even the 64-bit version) doesn't take advantage of extended addressing for data structures. (It might do so for library calls, but we can't be sure how that affects virtual memory size.)

What I think is happening is that your string workspace is also getting eaten up pretty badly because of the way string processing occurs. When you do the code to read in a line to your array, do you directly do a readline into an array row or do you do a readline to a string and then store it in the array? THAT choice would also make a big difference if it is what I think it is, i.e. the "heap" used for string processing.

CJ_London · Oct 7, 2022

don't think it's about reinventing the wheel, just applying the same wheel to a different 'vehicle'

sumdumgai · Oct 7, 2022

The_Doc_Man said:
The problem I see here is that using ReadLine to load an array, you are reading one line at a time anyway. Then you are processing one line at a time from that array, based on this comment: "Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA."

Why use the array at all? It seems as though it would buy you very little and it will cost you a bunch, because you now need TWO huge chunks in memory at once, once for the array that contains a bunch of strings and once for what gets loaded to the table (which ALSO is a bunch of strings), because the way Access stores things is to have pointers to records that are stored.

Because of virtual memory issues, even with split front-end and back-end, you cannot have more than 2 GB for the FE and 2 GB for the BE - but if you have a non-split DB you STILL won't get a full 4 GB of virtual memory. Even if your system has at least 8 GB of RAM, you cannot exceed 4 GB of virtual space because internally, Access (even the 64-bit version) doesn't take advantage of extended addressing for data structures. (It might do so for library calls, but we can't be sure how that affects virtual memory size.)

What I think is happening is that your string workspace is also getting eaten up pretty badly because of the way string processing occurs. When you do the code to read in a line to your array, do you directly do a readline into an array row or do you do a readline to a string and then store it in the array? THAT choice would also make a big difference if it is what I think it is, i.e. the "heap" used for string processing.

I read a line to a string and then store it in the array.

Code:

           strTextLine = .ReadLine
            lineIndex = lineIndex + 1
            ReDim Preserve arrLines(1 To lineIndex)
            arrLines(lineIndex) = strTextLine

CJ_London · Oct 7, 2022

Can you create an example text file, the destination table or tables and some documentation regarding how the data is transformed (modified). Example data only needs a few rows to show all the transforming requirements.

sumdumgai · Oct 7, 2022

I thought of a simple solution and that is to filter the text file to remove lines that are not needed before it is imported. In the interest of saving time, can someone please provide some sample code to read a line of text (tab delimited and ending with LF), check if a particular field contains a certain value, and either remove that line from the text file (I think this would be harder), or, if it does not contain the value, write the line to a new text file . With the filtered text file, I don't think I'll get the memory issue because most of the lines would have been removed and the array would be much smaller.

Thanks.

Gasman · Oct 7, 2022

Someone was doing pretty much that in this thread

Edit cells in csv file via VBA

I have a csv file with exported data form Access. Problem is that I need to add 2 rows on top of the data in that csv. Simply push down all data I exported in that csv (including headers) and add 2 rows with specific values. I managed to write a vba to push whole data down by 2 rows, however...

www.access-programmers.co.uk

arnelgp · Oct 7, 2022

sumdumgai said:
ReDim Preserve arrLines(1 To lineIndex)

rediming in your code will slow down your code.
you need to create the Largest Array first.
then Redim it (to smaller size) when you are done with your code, example:

Dim arrLines() As String

Redim arrLines(10000) 'will hold 10k lines

..
..
strTextLine = .ReadLine
lineIndex = lineIndex + 1
arrLines(lineIndex) = strTextLine
...
...
Redim Preserve arrLines(1 to lineIndex)

CJ_London · Oct 7, 2022

check if a particular field contains a certain value,

investigate the instr function

having determined the value exists - pass the line on to your array, otherwise ignore and move on to the next line

Code:

if instr(readline,"value")>0 then ' add to array
    'add to array
else
    'move to next line
end if

sumdumgai · Oct 7, 2022

Thanks to all. I think I have it solved by filtering out unwanted lines before adding to array.

Pat Hartman · Oct 7, 2022

Are you saying that the record format does not allow you to link to the table using TransferText? I've had similar problems with files created by Unix and I've had to open the file as a blob and then loop through the records and add them to a table. But you are not doing this efficiently. You are already looping through the data, there is no need to add some records to an array and loop through that later. Just do what needs to be done using a single loop through the blob.

And if you don't have trouble linking to the text file, then use the method suggested earlier by CJ which uses queries. Action queries are always faster than VBA loops.

The_Doc_Man · Oct 7, 2022

Definitely concur with arnelgp... the ReDim to grow the array is a major contributor to the problem. Dynamic allocation to larger amounts can lead to SERIOUSLY bad shuffling of memory. I could imagine this leading to a problem in factorial memory growth. Factorial growth is not forgiving. The mechanics of expanding an already-created array are going to chew up the memory area known as the HEAP - and not in a good way.

Importing large text file

Registered User.

I’m here to help

Super Moderator

Super Moderator

Registered User.

Registered User.

Super Moderator

Registered User.

Immoderate Moderator

Super Moderator

Registered User.

Super Moderator

Registered User.

Enthusiastic Amateur

..forever waiting... waiting for jellybean!

Super Moderator

Registered User.

Super Moderator

Immoderate Moderator

Similar threads

Users who are viewing this thread