Importing large text file (1 Viewer)

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
I have a very large text file (644MB) that I need to import into a table. At some point in the process, I get an 'Out of Memory' condition. I'm using the '.Readline' command, placing each line read into an array, which is the used to as import data. The process works for text files that are 388MB.

Any help or suggestions would be appreciated.

Thanks.
 

theDBguy

I’m here to help
Staff member
Local time
Today, 01:08
Joined
Oct 29, 2018
Messages
21,498
Are you able to import it manually? Just wondering if Access can handle it (using the Wizard).
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 04:08
Joined
Feb 19, 2002
Messages
43,371
I get an 'Out of Memory' condition.
Not sure what you would expect if you are loading the HUGE file into an array. Why are you not using a process that loads record by record into a table?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 09:08
Joined
Feb 19, 2013
Messages
16,636
agree with Pat - why not just use transfertext?

if you need to modify values, use transfertext to link to the text file, then an append query

will be a lot faster as well
 

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
agree with Pat - why not just use transfertext?

if you need to modify values, use transfertext to link to the text file, then an append query

will be a lot faster as well
The data is appended to existing data, and importing same data twice is not allowed. Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 09:08
Joined
Feb 19, 2013
Messages
16,636
The data is appended to existing data, and importing same data twice is not allowed.
no problem - use a left join query to identify the new records e.g.

Code:
SELECT source.*
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null


Some records and/or fields are ignored,
no problem - ignore them in your append query e.g.

Code:
INSERT INTO destination (fld1, fld2, fld4)
SELECT source.fld1, source.fld2, source.fld4
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null


others are modified, line by line
shouldn't be a problem, depends on what the modifications are e.g.

Code:
INSERT INTO destination (fld1, fld2, fld4, fld6)
SELECT source.fld1, source.fld2, source.fld4 & " " & source.fld5, source.fld6*1.25
FROM source LEFT JOIN destination ON source.fld1=destination.fld1 and source.fld2=destination.fld2
WHERE destination.fld1 is null


even if you don't go the sql route, you can create a recordset based on the linked table created by transfertext and loop through that rather than reading into an array then from the array.

Since your array method fails due to memory issues, you either need to invest in more memory, or you need to develop a different method.

If you need help on that then you will need to explain what the modifications are
 

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
Thanks to all for your suggestions. The code to process these text files was developed long ago and I'd rather not reinvent the wheel. Right now, I'm trying to read the text file into the array in chunks, using a max line count safety value (e.g. 100,000) to limit memory use, and starting line numbers in the text file, so pass 1 gets line 1 to 100,000, pass 2 gets line 100,001 to 200,000, ... until all lines are read.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 03:08
Joined
Feb 28, 2001
Messages
27,236
The problem I see here is that using ReadLine to load an array, you are reading one line at a time anyway. Then you are processing one line at a time from that array, based on this comment: "Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA."

Why use the array at all? It seems as though it would buy you very little and it will cost you a bunch, because you now need TWO huge chunks in memory at once, once for the array that contains a bunch of strings and once for what gets loaded to the table (which ALSO is a bunch of strings), because the way Access stores things is to have pointers to records that are stored.

Because of virtual memory issues, even with split front-end and back-end, you cannot have more than 2 GB for the FE and 2 GB for the BE - but if you have a non-split DB you STILL won't get a full 4 GB of virtual memory. Even if your system has at least 8 GB of RAM, you cannot exceed 4 GB of virtual space because internally, Access (even the 64-bit version) doesn't take advantage of extended addressing for data structures. (It might do so for library calls, but we can't be sure how that affects virtual memory size.)

What I think is happening is that your string workspace is also getting eaten up pretty badly because of the way string processing occurs. When you do the code to read in a line to your array, do you directly do a readline into an array row or do you do a readline to a string and then store it in the array? THAT choice would also make a big difference if it is what I think it is, i.e. the "heap" used for string processing.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 09:08
Joined
Feb 19, 2013
Messages
16,636
don't think it's about reinventing the wheel, just applying the same wheel to a different 'vehicle'
 

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
The problem I see here is that using ReadLine to load an array, you are reading one line at a time anyway. Then you are processing one line at a time from that array, based on this comment: "Some records and/or fields are ignored, others are modified, line by line as they are read in from the array using VBA."

Why use the array at all? It seems as though it would buy you very little and it will cost you a bunch, because you now need TWO huge chunks in memory at once, once for the array that contains a bunch of strings and once for what gets loaded to the table (which ALSO is a bunch of strings), because the way Access stores things is to have pointers to records that are stored.

Because of virtual memory issues, even with split front-end and back-end, you cannot have more than 2 GB for the FE and 2 GB for the BE - but if you have a non-split DB you STILL won't get a full 4 GB of virtual memory. Even if your system has at least 8 GB of RAM, you cannot exceed 4 GB of virtual space because internally, Access (even the 64-bit version) doesn't take advantage of extended addressing for data structures. (It might do so for library calls, but we can't be sure how that affects virtual memory size.)

What I think is happening is that your string workspace is also getting eaten up pretty badly because of the way string processing occurs. When you do the code to read in a line to your array, do you directly do a readline into an array row or do you do a readline to a string and then store it in the array? THAT choice would also make a big difference if it is what I think it is, i.e. the "heap" used for string processing.
I read a line to a string and then store it in the array.
Code:
           strTextLine = .ReadLine
            lineIndex = lineIndex + 1
            ReDim Preserve arrLines(1 To lineIndex)
            arrLines(lineIndex) = strTextLine
 

CJ_London

Super Moderator
Staff member
Local time
Today, 09:08
Joined
Feb 19, 2013
Messages
16,636
Can you create an example text file, the destination table or tables and some documentation regarding how the data is transformed (modified). Example data only needs a few rows to show all the transforming requirements.
 

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
I thought of a simple solution and that is to filter the text file to remove lines that are not needed before it is imported. In the interest of saving time, can someone please provide some sample code to read a line of text (tab delimited and ending with LF), check if a particular field contains a certain value, and either remove that line from the text file (I think this would be harder), or, if it does not contain the value, write the line to a new text file . With the filtered text file, I don't think I'll get the memory issue because most of the lines would have been removed and the array would be much smaller.

Thanks.
 

Gasman

Enthusiastic Amateur
Local time
Today, 09:08
Joined
Sep 21, 2011
Messages
14,366
Someone was doing pretty much that in this thread
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 16:08
Joined
May 7, 2009
Messages
19,246
ReDim Preserve arrLines(1 To lineIndex)
rediming in your code will slow down your code.
you need to create the Largest Array first.
then Redim it (to smaller size) when you are done with your code, example:


Dim arrLines() As String

Redim arrLines(10000) 'will hold 10k lines

..
..
strTextLine = .ReadLine
lineIndex = lineIndex + 1
arrLines(lineIndex) = strTextLine
...
...
Redim Preserve arrLines(1 to lineIndex)
 

CJ_London

Super Moderator
Staff member
Local time
Today, 09:08
Joined
Feb 19, 2013
Messages
16,636
check if a particular field contains a certain value,
investigate the instr function

having determined the value exists - pass the line on to your array, otherwise ignore and move on to the next line

Code:
if instr(readline,"value")>0 then ' add to array
    'add to array
else
    'move to next line
end if
 

sumdumgai

Registered User.
Local time
Today, 04:08
Joined
Jul 19, 2007
Messages
453
Thanks to all. I think I have it solved by filtering out unwanted lines before adding to array.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 04:08
Joined
Feb 19, 2002
Messages
43,371
Are you saying that the record format does not allow you to link to the table using TransferText? I've had similar problems with files created by Unix and I've had to open the file as a blob and then loop through the records and add them to a table. But you are not doing this efficiently. You are already looping through the data, there is no need to add some records to an array and loop through that later. Just do what needs to be done using a single loop through the blob.

And if you don't have trouble linking to the text file, then use the method suggested earlier by CJ which uses queries. Action queries are always faster than VBA loops.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 03:08
Joined
Feb 28, 2001
Messages
27,236
Definitely concur with arnelgp... the ReDim to grow the array is a major contributor to the problem. Dynamic allocation to larger amounts can lead to SERIOUSLY bad shuffling of memory. I could imagine this leading to a problem in factorial memory growth. Factorial growth is not forgiving. The mechanics of expanding an already-created array are going to chew up the memory area known as the HEAP - and not in a good way.
 

Users who are viewing this thread

Top Bottom