The fastest way to import over 2000 csv files (1 Viewer)

Guus2005

AWF VIP
Local time
Today, 16:01
Joined
Jun 26, 2007
Messages
2,645
Wat is the fastest way to import 2300 similar csv files into an Access table?

There is a number of ways to accomplish this. I need the fastest.

Thanks for your time!
 

Minty

AWF VIP
Local time
Today, 15:01
Joined
Jul 26, 2013
Messages
10,354
Hmm, interesting question.
If you have the source file names and target table name(s) in a table , using a recordset and loop, you could link to them,
then do a
SELECT * INTO TargetTableName if each one was going into a different table.
Or an
INSERT INTO TheSingleTable if they are all going into a single table.

Within the loop.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 10:01
Joined
Feb 28, 2001
Messages
26,999
First decide whether they will all fit. Look carefully into sizing, because 2300 of anything that has any substance to it will be a lot of bytes. In April, you mentioned two CSV files that would add 500 Mb total to an empty DB. Are these related? If you have a 10 Kb CSV (on average size), you've got 23 Mb not counting indexes or any other overhead. But if the files are far more substantial than 10Kb average size, you may have an issue.

NOTE: If you had said "into an SQL table" (implying some form of SQL server backend) I would not have asked the question, but with Access you are limited in how big you can get. And you didn't say how big the files were, only how many there are.

Next, consider whether they are ready to be imported or whether they will need some filtration, adjustment, and cleanup. You might wish to do this in stages despite your desire to do this quickly.

In your other post in April, you expressed concern about bloat. There is such a thing as creating an "aside" table in a file you create as a "throwaway" file - where you import to the external temp table and diddle what you need to diddle, then export the finished product via a query, and finally drop the side table and delete the file. Lather, rinse, repeat - and the only file likely to get badly bloated gets deleted and recreated. No permanent bloat and you don't care if you are a little bit sloppy on the temp tables 'cause they won't last.

Finally, your concern for "fastest" also depends on whether the files are ready for prime time. If they require cleaning or adjustment, you have no choice but to consider a file-by-file operation with intermediate "scrubbing" steps.

You MIGHT consider that "external file" concept by doing ONE FILE AT A TIME to import one of the CSVs to a virgin DB file, then massage the content. Then when all 2300 files are massaged, and presuming that the 2300 files are structurally congruent, you could write a loop to step through importation from the external files using an IN sub-clause (in the FROM clause) to direct the imports piecemeal.


Then, when you are satisfied that all is well and you can ditch the files, do so. NOTE: Bulk deletion can be a rather complex process with regard to the directory files but ALSO with regard to the system's file descriptors in the master file index.
 

Guus2005

AWF VIP
Local time
Today, 16:01
Joined
Jun 26, 2007
Messages
2,645
Thanks for your reply both!

@Docman you have a few valid points i haven't thought about. Size matters!
No, this question and the one in April are not related. But perhaps i can combine both answers from then and now.

I forgot about the IN clause of MS Access SQL. That might come in handy.

I'll put my solution here when i decided on one. Thanks again.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 11:01
Joined
Feb 19, 2002
Messages
42,970
If all the files are in the same folder, you can use FSO (file system object) to loop through the folder and use TransferText to append the data. Given the number of files and an over abundance of caution, I would also create a log table. The first step would be to add the full path to the log table and capture the autonumber of the new log record. After each import, I would run an update query that updates all rows with null as the LogID to the LogID for the current log record.

You can also add additional logic to check the log table to see if the full path of the next file to import has already been imported and bypass it. This gives you an easy way to recover if the process stops and has to be restarted for some reason
 

CJ_London

Super Moderator
Staff member
Local time
Today, 15:01
Joined
Feb 19, 2013
Messages
16,553
I would add that if this is a straight import ie all records across the 2000 files are unique so no need to check if already exists then ensure the destination table has no indexes. These take time to update as well as the table. Indexes can always be added once the import of all files is complete
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 10:01
Joined
Feb 28, 2001
Messages
26,999
Fully concur with CJ. Put on the index after the table is built. Turns out the same is true when loading a table in ORACLE. With the U.S. Navy Reserve personnel DB, I am forbidden to tell you how many records were involved, but I can say without hesitation that it was FAR faster to just load the tables linearly and then build the indexes after the fact. In essence, the difference is less shuffling as you add the names individually. Still takes a while to build the indexes, but I think Access and ORACLE have in common that if you are adding the stuff for the first time, they both do something to sort first and add indexes second. Or something conceptually similar. I.e. they both know they have a wholesale operation rather than a piecemeal operation.
 

Guus2005

AWF VIP
Local time
Today, 16:01
Joined
Jun 26, 2007
Messages
2,645
Thanks for the input. This is great stuff!
Thanks everyone!
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 02:01
Joined
Jan 20, 2009
Messages
12,849
Fully concur with CJ. Put on the index after the table is built. Turns out the same is true when loading a table in ORACLE.
Disabling the indexes during big imports is pretty standard practice with databases. That way the indexes only need to be rebuilt once.

If I were going to regularly import large numbers of files I would use BULK INSERT in SQL Server. An SQL Server back end brings so very many advantages to an Access database application. The queries are also so much easier to write and more powerful than in Access.

I would strongly encourage all Access developers to start using SQL Server backends. It is a natural step up and you will never look back when you discover what it can do. Over time you will realise that you had been mighty impressed when you had found less than ten percent of what SQL Server can do. I've been working with it for years and I still find new stuff all the time.

The Express version is free and its limits are way beyond Access's meagre 2GB. The Developer Version with all the features of Enterprise is almost free.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 15:01
Joined
Sep 12, 2006
Messages
15,613
I always import csv files to a staging table first.

Personally
I would check for import errors,
I would double check the column names and order of the imported table are what I expect.
I would do some validation on the data
I would add some mechanism to prevent a file being imported multiple times
I would still make sure the data was all new, which implies having a way to check for a unique index.

I would probably add a column for "source table" for the imported data, in the event that it later becomes clear that there wasan unforeseeen issue with the data.

I didn't know you could disable indexes when adding data. What happens if the added data then introduces issues that prevent the indexes being rebuilt?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 15:01
Joined
Feb 19, 2013
Messages
16,553
don't disagree with data validation, just depends on the requirement.

You can't disable indexes, you need to remove them then reinstate once import completed. Assumption is this is a new table

rather than linking to each csv file, use sql along these lines

Code:
INSERT INTO destTable
SELECT *
FROM (SELECT * FROM [TEXT;DATABASE=C:\filefoldername;HDR=Yes].sourcefilename.csv)  AS txt

assumes all files are in one folder (C:\filefoldername) and you would change the sourcefilename on each iteration of the loop (getting it from the dir function perhaps

Code:
dim sf as string
dim db as dao.database

set db=currentdb
sf=dir("C:\filefoldername")

while sf<>""
    db.execute("INSERT INTO destTable SELECT * FROM (SELECT * FROM [TEXT;DATABASE=C:\filefoldername;HDR=Yes]." & sf & ")  AS txt",dbfailonerror)
    sf=dir
wend
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 02:01
Joined
Jan 20, 2009
Messages
12,849
You can't disable indexes, you need to remove them then reinstate once import completed.
Yet another advantage of using SQL Server where disabling and rebuilding indexes is a simple routine process.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 23:01
Joined
May 7, 2009
Messages
19,169
here i tested this one.
Code:
'''''
' arnelgp
'
Private Sub t()
    Const conInsert As String = _
    "insert into table1 ([Name]) " & _
    "select [Name] from [Text;FMT=Delimited(,);HDR=Yes;IMEX=2;ACCDB=YES;DATABASE=the_path].[the_file]"
    
    Dim strInsert As String
    Dim strpath As String
    Dim strfile As String
    
    'place the path of the textfile here
    strpath = "D:\arnel\"
    strfile = Dir$(strpath & "*.txt")
    
    Do Until Len(strfile) < 1
        'replace "the_path" on the constant string
        'with our path
        '
        'replace "the_file" with the text filename
        '
        '
        'we need to replace "." on the filename with "#"
        '... i don't know the why either
        strfile = Replace$(strfile, ".", "#")
        strInsert = Replace$(Replace$(conInsert, "the_path", strpath), "the_file", strfile)
    
        CurrentDb.Execute strInsert
        
        strfile = Dir$
    Loop

End Sub
 

Guus2005

AWF VIP
Local time
Today, 16:01
Joined
Jun 26, 2007
Messages
2,645
My csv inputfile looks more or less like this:

1;aap;noot;mies
2;wim;teun;vuur
3;gijs;kees;bok

When i use any of the proposed code solutions above, only the first field is filled with the complete record.

This is what i tried:
Code:
insert into Tabel1 ([Id], [Veld1],[Veld2],[Veld3]) select F1,F2,F3,F4 from [Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]
I also tried:
Code:
insert into Tabel1 ([Id], [Veld1],[Veld2],[Veld3]) select split(F1,';')(0) As T1,split(F1,';')(1) AS T2,split(F1,';')(2) AS T3,split(F1,';')(3) AS T4 from [Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]

Not sure of the arguments between the square brackets:
Code:
[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]
Tried also:
Code:
[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]
[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]
[Text;FMT=Delimited(;);HDR=No;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]
I can't find any help page on the arguments between the square brackets.

Anyone has the solution of these problems?

Thanks!
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 11:01
Joined
Feb 19, 2002
Messages
42,970
I would add that if this is a straight import ie all records across the 2000 files are unique so no need to check if already exists
You are assuming a one time import. I was assuming an on going process where "new" files ae imported every "month".

@Guus2005 If you can post the db and at least three files, We can try the import for you.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 15:01
Joined
Feb 19, 2013
Messages
16,553
the problem is the semi colon rather than a comma. It looks like you need to make a change in the registry - see this link
https://social.msdn.microsoft.com/F...8f-a342-33066ee9dd78/question-of-fmtdelimited. Also perhaps this link to change regional settings.
Don't know if either will work

Out of curiosity I tried to use the import wizard to create a new table, append to an existing and create a linked table - all failed.

I changed your example data to commas and it worked without a problem - and no need to specify format (FMT)

Also found this link, but I'm not a member so cannot see the answers

and found out that the registry entry has been moved per first link provided - it can now be found in

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Jet 4.0

plenty of examples online about how to edit the registry
 
Last edited:

Guus2005

AWF VIP
Local time
Today, 16:01
Joined
Jun 26, 2007
Messages
2,645
Here are a few sample input files.
Still can't find some help page about [Text;FMT=Delimited(;);HDR=No;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]

Any help is welcome!
Thanks!
 

Attachments

  • ReadFiles.zip
    22.2 KB · Views: 122

CJ_London

Super Moderator
Staff member
Local time
Today, 15:01
Joined
Feb 19, 2013
Messages
16,553
I have a solution, but how practical it will be for you, I don't know. It adds a layer of complexity

SQL is

SELECT *
FROM (SELECT * FROM [Text;DSN=Bestand1LinkSpec;HDR=NO;IMEX=2;ACCDB=YES;DATABASE=D:\Dev\1_temp].bestand1.txt) AS txt;

You need to create a link specification, and you need to include HDR=NO because the import spec actually controls that (change it to yes or remove it and you will lose the first row of data.

So the bad news is you need to create a specification. The good news is you can do this in vba.

If all your files are the same spec (one numeric column followed by 3 text columns) you can use the same import specification

If they can vary, you can use vba to open the txt file, inspect the first row of data and build your own specification - the tables you need to look at are
MSysIMEXSpecs


DateDelimDateFourDigitYearDateLeadingZerosDateOrderDecimalPointFieldSeparatorFileTypeSpecIDSpecNameSpecTypeStartRowTextDelimTimeDelim
/
-1​
0​
0​
.;
850​
1​
Bestand1LinkSpec
1​
1​
:

and
MSysIMEXColumns

AttributesDataTypeFieldNameIndexTypeSkipColumnSpecIDStartWidth
0​
10​
FIRST
0​
0​
1​
5​
10​
0​
4​
LNR
0​
0​
1​
1​
4​
0​
10​
SECND
0​
0​
1​
15​
12​
0​
10​
THRD
0​
0​
1​
27​
11​
Suggest you google these to find out more about the columns - I do know the width is based around the largest value in the column - increase SECND for 12 to 255 for example
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 23:01
Joined
May 7, 2009
Messages
19,169
test my new class.
run test1 on module1.
 

Attachments

  • testFiles.zip
    49 KB · Views: 237

Pat Hartman

Super Moderator
Staff member
Local time
Today, 11:01
Joined
Feb 19, 2002
Messages
42,970
Import ONE of the files MANUALLY so you can create an import Spec. As you go through the dialog, you will need to press the advanced button so you can save the spec and give it a rational name as well as specify the delimiter character.

Once the spec is saved, you just reference it by its name.

Don't forget, There are two ways to create a spec. This is the OLD way since there is no easy way to modify the new way and it assumes a specific file name, etc which is NOT what you want. Do NOT check the final save as. That creates the "new" spec.
 

Attachments

  • ImportSpecDialog.zip
    186.3 KB · Views: 259
Last edited:

Users who are viewing this thread

Top Bottom