Importing very large text file

sumdumgai

Registered User.
Local time
Today, 02:05
Joined
Jul 19, 2007
Messages
453
I am importing a very large (.8 GB) pipe delimited text file (1.6 million records) into a new Access database using Access 2013. The VBA routine I have been using uses a function to split the file into separate records and fields, which I then add to the table one record at a time. The 'split' function does not work for such a large file. I suspect it's because the function first sucks the entire file into one large string and the string is too long.

Does anyone know how a very large text file can be imported. I can link the text file to a table, but that won't work because there are fields in the file that have to be indexed to allow future queries to run efficiently,

Thanks.
 
Hi,

to the best of my knowledge, Access DB, or even more: Access table, is limited in the size of data it can contain and manage.
Are you sure you'r not beyond this limit? I'm not sure your not.
 
Quite right marlan.

You've already hit the db size limit which I believe is roughly 2GB for Access 2010. It's time to upscale to an enterprise server.
 
The data fits because I can import it using Access/External Data/Text File. In the process, I skip certain fields so maybe that is why it fits. I can save the Import Spec and use it in my VBA routine, but that leaves me no option to further filter or change the data as it comes in (which I can do if I import using VBA a record at a time. It's a bit slower but not that much.

Am just looking for a VBA way of doing a large file.

Thanks.
 
The resultant data fits, ok! I'm guessing this is related to your other thread?

What exactly is wrong with the linked table method? You basically have two options, linked table or loop through a recordset/file system object.
 
I suspect it's because the function first sucks the entire file into one large string and the string is too long
instead of pre-loading the whole file, which seems a bit crazy, do it this way

Code:
open file
repeat
    readline
    processline
until eof
if you know how many lines you have, you can add a progress monitor. (You can add a counter anyway, but unless you know how many lines there are in total, it won't help that much.)

if you are constructing lookup tables as you go, you will probably reduce the overall size.

it would probably help if you could divide the mega file into several smaller ones.

800Mb will take a long time to process.


[addendum. based on #1 - is all the data being "columnised" on your initial import - or do you end up with a single large text field for each item - that you then have to parse. That's what you want to avoid. ]
 
You're using Split() but split isn't the only way of identifying the right part of your record string. Ok, let me break it down for you:

Loop and FIND method using a recordset/file system object/ Open file/ADO text provider etc methods:
  1. You load the file into memory using one of the above methods
  2. You then also have a recordset of your Speciality table
  3. Begin looping through the file
  4. For each record in the file you check if a corresponding record exists in the Speciality table
  5. The check must use an appropriate search method, i.e.:
    • WHERE clause and Requery the recordset
    • Filter method
    • FindFirst/FindNext methods
    • Seek method etc...
  6. Insert/Edit and save record
    • .Add/.Edit and .Save methods
    • Execute an INSERT/UPDATE SQL...
Note the major bottlenecks highlighted in blue. For example, you have to insert one record at a time.

Linked table > INNER JOIN method:
  1. Create an appropriate INNER JOIN between the linked table and the Speciality table using Instr() and Mid(). The join will be between the Instr/Mid field and the respective Speciality field
  2. Fire an UPDATE/INSERT statement... now this is a Bulk update/insert which the Jet Engine handles nicely.
The only major bottleneck is the Join.

If you aren't adding new records then option 1 will be the favourable approach.
 
Thanks.
To gemma: Can't determine number of lines. That is what the split function did. For this large file, it returns 0 lines. And yes, I've come across a routine or two where, using Excel, the text file is split across multiple sheets which are then imported sequentially into Access. Don't like that because it introduces Excel into the mix.
To vbaInet: I'll have to stury your suggestion above. But getting back to the linked version, I've now decided to not filter on specialty codes; i.e., I will importing all records (although not all of the fields in each record). How do I get the linked data into a table which will have the same fields as the linked table, but some of the fields will be indexed?
 
Worst comes to worse you can read the file character by character if it is a unix file...

I do the same for a 300 meg file, stuppid formatted nonsense crap file that is...
processing isnt done in 2 seconds but it doesnt take 24 hours either.
 
To answer your question about indexing, create a link to the file, create a make-table query from that linked table which will only include the relevant fields and when executed will create a new table for you, and add your indexes. Or use the Import/Export Wizard to import your file, and create the indexes afterwards. Then delete the unwanted fields.

To add to the mix about inserting from a recordset, you can batch update every X records which should help a little in approach 1.

Is there any particular reason why you've chosen not to link and now wish to import the whole data? Of course a local table with help with performance if that's what you're thinking but whether you can fit all those records into the db is the main question, hence, the linking at the initial stage.
 
vbaInet: It'sjust that requirements have changed and all of the specialty codes are needed.
Here's what I've done and it appears to work. Getting the steps into VBA or some other method to automate the process is next challenge.
1) Set up 'linked table', linked to the external text file, choosing data types for each field and skipping fields that are not needed.
2) Copy and paste the 'linked table' to another 'local table', structure only
3) Modify design of new 'local table'. Insert 'ID' row with AutoNumber data type, and set Indexed Dups Ok on important fields
4) Run a query (INSERT INTO 'local table' SELECT * FROM 'linked table';

Runs fairly fast and the result is a local table that has all of the indices I need. Like I said, only problem now is automating the whole thing for future imports.
 
as long as the text file is constant all you need to do is place the file and run the insert into query??

Possibly preceded by a delete query? If so, some concerns about bloating arise
 
Thanks. Not meaning to go beyond purpose of this post, but is it possible to create linked tables using VBA?
 
Runs fairly fast and the result is a local table that has all of the indices I need.
Yes that's the beauty of linked tables.

2) Copy and paste the 'linked table' to another 'local table', structure only
3) Modify design of new 'local table'. Insert 'ID' row with AutoNumber data type, and set Indexed Dups Ok on important fields
4) Run a query (INSERT INTO 'local table' SELECT * FROM 'linked table';

Runs fairly fast and the result is a local table that has all of the indices I need. Like I said, only problem now is automating the whole thing for future imports.
Unfortunately you're still thinking of your Access database as an Excel spreadsheet. In a database one does not copy and paste records, nor does one continually create/delete local tables or create indexes at will.

Let me revise your steps the Access way:
2) Create a Make-Table query based on your linked table and execute the query. Now this will create a table structure as well as copy the data from the linked table to the local table
3) Modify the design of new 'local table' - this is fine but I don't see the need of an AutonumberID. Perhaps you want to include a Now() field that will indicate when data was imported for future updates.
4) Not necessary because it's already done in step 2

All of the above is something that should be done manually in order to establish a stable table structure that doesn't change structurally.

For future updates to this table you would automate the following:
1. Link to the table - this link should already be established so maintain the same file name and cleanse the file regularly
2. Create a Select Query that finds records that exist in the linked table but not in the local table
3. Run an APPEND Query based on the results of 2.

I hope you follow?

My only concern is that you mentioned about automating the process you mentioned in your last post and this would indicate that you've not given us the full picture as to whether you were planning on deleting the table everytime or not.
 
Yes, whenever a new text file is available for import, the old tables will either be deleted or overwritten, whichever is cleaner. There's that matter of space. The database, after import and compact, is about 900mb.
 
Considering the size of your raw data, importing the file into your main database shouldn't be an option because you will soon hit the 2GB limit. However, importing into a different database and linking to that will work just fine.

* Create a db containing one table that you'll use for your imports
* Prep the table, i.e. define field names, types, indexes etc
* From your main db setup a linked table to this other db
* You know what else to do from here... I've reiterated this several times.
 
I found a VBA routine that imports from text to a linked table and it works pretty well, but like you said, I won't be needing it. Now I'll look for a routine that will create an Import Spec. That will save me a lot of work for other tasks.

Thanks again,
 
When you say import you mean link right?

Yes good thinking. An import spec would ensure that it comes in the right format.

By the way, what was the problem with linking to the raw file in the first place?
 
No, I'm trying to find a way to actually create an Import Specification that will import data from a file into a local table. The main problem I had in trying to import the very large text file was that my split function could not handle the data, It returned 0 line count for the etxt file. I then had to manually import the file using Access/External/Text File. That worked, but my task is to automate the import so that other users could do it in the future. With your (and others) help, I now have another way of importing the file using linked tables.

The problem with just linking to the raw file is lack of indexed fields. All queries would have to sequentially go through all 1.5 million records and be very slow (as they have been until I created the local table (with indices) from the linked one).

If I can find a way to create, or modify, a Saved Import Spec, by using forms to allow the user to define the import file characteristics and desired indices, I'm on the way to making any future imports virtually automatic without any programming changes. Hope that makes sense.
 

Users who are viewing this thread

Back
Top Bottom