The fastest way to import over 2000 csv files

Guus2005 · Aug 12, 2021

Wat is the fastest way to import 2300 similar csv files into an Access table?

There is a number of ways to accomplish this. I need the fastest.

Thanks for your time!

Minty · Aug 12, 2021

Hmm, interesting question.
If you have the source file names and target table name(s) in a table , using a recordset and loop, you could link to them,
then do a
SELECT * INTO TargetTableName if each one was going into a different table.
Or an
INSERT INTO TheSingleTable if they are all going into a single table.

Within the loop.

The_Doc_Man · Aug 12, 2021

First decide whether they will all fit. Look carefully into sizing, because 2300 of anything that has any substance to it will be a lot of bytes. In April, you mentioned two CSV files that would add 500 Mb total to an empty DB. Are these related? If you have a 10 Kb CSV (on average size), you've got 23 Mb not counting indexes or any other overhead. But if the files are far more substantial than 10Kb average size, you may have an issue.

NOTE: If you had said "into an SQL table" (implying some form of SQL server backend) I would not have asked the question, but with Access you are limited in how big you can get. And you didn't say how big the files were, only how many there are.

Next, consider whether they are ready to be imported or whether they will need some filtration, adjustment, and cleanup. You might wish to do this in stages despite your desire to do this quickly.

In your other post in April, you expressed concern about bloat. There is such a thing as creating an "aside" table in a file you create as a "throwaway" file - where you import to the external temp table and diddle what you need to diddle, then export the finished product via a query, and finally drop the side table and delete the file. Lather, rinse, repeat - and the only file likely to get badly bloated gets deleted and recreated. No permanent bloat and you don't care if you are a little bit sloppy on the temp tables 'cause they won't last.

Finally, your concern for "fastest" also depends on whether the files are ready for prime time. If they require cleaning or adjustment, you have no choice but to consider a file-by-file operation with intermediate "scrubbing" steps.

You MIGHT consider that "external file" concept by doing ONE FILE AT A TIME to import one of the CSVs to a virgin DB file, then massage the content. Then when all 2300 files are massaged, and presuming that the 2300 files are structurally congruent, you could write a loop to step through importation from the external files using an IN sub-clause (in the FROM clause) to direct the imports piecemeal.

IN clause (Microsoft Access SQL)

Office VBA reference topic

docs.microsoft.com

Then, when you are satisfied that all is well and you can ditch the files, do so. NOTE: Bulk deletion can be a rather complex process with regard to the directory files but ALSO with regard to the system's file descriptors in the master file index.

Guus2005 · Aug 12, 2021

Thanks for your reply both!

@Docman you have a few valid points i haven't thought about. Size matters!
No, this question and the one in April are not related. But perhaps i can combine both answers from then and now.

I forgot about the IN clause of MS Access SQL. That might come in handy.

I'll put my solution here when i decided on one. Thanks again.

Pat Hartman · Aug 12, 2021

If all the files are in the same folder, you can use FSO (file system object) to loop through the folder and use TransferText to append the data. Given the number of files and an over abundance of caution, I would also create a log table. The first step would be to add the full path to the log table and capture the autonumber of the new log record. After each import, I would run an update query that updates all rows with null as the LogID to the LogID for the current log record.

You can also add additional logic to check the log table to see if the full path of the next file to import has already been imported and bypass it. This gives you an easy way to recover if the process stops and has to be restarted for some reason

CJ_London · Aug 12, 2021

I would add that if this is a straight import ie all records across the 2000 files are unique so no need to check if already exists then ensure the destination table has no indexes. These take time to update as well as the table. Indexes can always be added once the import of all files is complete

The_Doc_Man · Aug 13, 2021

Fully concur with CJ. Put on the index after the table is built. Turns out the same is true when loading a table in ORACLE. With the U.S. Navy Reserve personnel DB, I am forbidden to tell you how many records were involved, but I can say without hesitation that it was FAR faster to just load the tables linearly and then build the indexes after the fact. In essence, the difference is less shuffling as you add the names individually. Still takes a while to build the indexes, but I think Access and ORACLE have in common that if you are adding the stuff for the first time, they both do something to sort first and add indexes second. Or something conceptually similar. I.e. they both know they have a wholesale operation rather than a piecemeal operation.

Guus2005 · Aug 13, 2021

Thanks for the input. This is great stuff!
Thanks everyone!

Galaxiom · Aug 13, 2021

The_Doc_Man said:
Fully concur with CJ. Put on the index after the table is built. Turns out the same is true when loading a table in ORACLE.

Disabling the indexes during big imports is pretty standard practice with databases. That way the indexes only need to be rebuilt once.

If I were going to regularly import large numbers of files I would use BULK INSERT in SQL Server. An SQL Server back end brings so very many advantages to an Access database application. The queries are also so much easier to write and more powerful than in Access.

I would strongly encourage all Access developers to start using SQL Server backends. It is a natural step up and you will never look back when you discover what it can do. Over time you will realise that you had been mighty impressed when you had found less than ten percent of what SQL Server can do. I've been working with it for years and I still find new stuff all the time.

The Express version is free and its limits are way beyond Access's meagre 2GB. The Developer Version with all the features of Enterprise is almost free.

gemma-the-husky · Aug 13, 2021

I always import csv files to a staging table first.

Personally
I would check for import errors,
I would double check the column names and order of the imported table are what I expect.
I would do some validation on the data
I would add some mechanism to prevent a file being imported multiple times
I would still make sure the data was all new, which implies having a way to check for a unique index.

I would probably add a column for "source table" for the imported data, in the event that it later becomes clear that there wasan unforeseeen issue with the data.

I didn't know you could disable indexes when adding data. What happens if the added data then introduces issues that prevent the indexes being rebuilt?

CJ_London · Aug 13, 2021

don't disagree with data validation, just depends on the requirement.

You can't disable indexes, you need to remove them then reinstate once import completed. Assumption is this is a new table

rather than linking to each csv file, use sql along these lines

Code:

INSERT INTO destTable
SELECT *
FROM (SELECT * FROM [TEXT;DATABASE=C:\filefoldername;HDR=Yes].sourcefilename.csv)  AS txt

assumes all files are in one folder (C:\filefoldername) and you would change the sourcefilename on each iteration of the loop (getting it from the dir function perhaps

Code:

dim sf as string
dim db as dao.database

set db=currentdb
sf=dir("C:\filefoldername")

while sf<>""
    db.execute("INSERT INTO destTable SELECT * FROM (SELECT * FROM [TEXT;DATABASE=C:\filefoldername;HDR=Yes]." & sf & ")  AS txt",dbfailonerror)
    sf=dir
wend

Galaxiom · Aug 13, 2021

CJ_London said:
You can't disable indexes, you need to remove them then reinstate once import completed.

Yet another advantage of using SQL Server where disabling and rebuilding indexes is a simple routine process.

arnelgp · Aug 13, 2021

here i tested this one.

Code:

'''''
' arnelgp
'
Private Sub t()
    Const conInsert As String = _
    "insert into table1 ([Name]) " & _
    "select [Name] from [Text;FMT=Delimited(,);HDR=Yes;IMEX=2;ACCDB=YES;DATABASE=the_path].[the_file]"
    
    Dim strInsert As String
    Dim strpath As String
    Dim strfile As String
    
    'place the path of the textfile here
    strpath = "D:\arnel\"
    strfile = Dir$(strpath & "*.txt")
    
    Do Until Len(strfile) < 1
        'replace "the_path" on the constant string
        'with our path
        '
        'replace "the_file" with the text filename
        '
        '
        'we need to replace "." on the filename with "#"
        '... i don't know the why either
        strfile = Replace$(strfile, ".", "#")
        strInsert = Replace$(Replace$(conInsert, "the_path", strpath), "the_file", strfile)
    
        CurrentDb.Execute strInsert
        
        strfile = Dir$
    Loop

End Sub

Guus2005 · Sep 14, 2021

My csv inputfile looks more or less like this:

1;aap;noot;mies
2;wim;teun;vuur
3;gijs;kees;bok

When i use any of the proposed code solutions above, only the first field is filled with the complete record.

This is what i tried:

Code:

insert into Tabel1 ([Id], [Veld1],[Veld2],[Veld3]) select F1,F2,F3,F4 from [Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]

I also tried:

Code:

insert into Tabel1 ([Id], [Veld1],[Veld2],[Veld3]) select split(F1,';')(0) As T1,split(F1,';')(1) AS T2,split(F1,';')(2) AS T3,split(F1,';')(3) AS T4 from [Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]

Not sure of the arguments between the square brackets:

Code:

[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]

Tried also:

Code:

[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=NO;DATABASE=N:\Mijn documenten\].[bestand1#csv]
[Text;FMT=Delimited(;);HDR=No;IMEX=2;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]
[Text;FMT=Delimited(;);HDR=No;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]

I can't find any help page on the arguments between the square brackets.

Anyone has the solution of these problems?

Thanks!

Pat Hartman · Sep 14, 2021

I would add that if this is a straight import ie all records across the 2000 files are unique so no need to check if already exists

You are assuming a one time import. I was assuming an on going process where "new" files ae imported every "month".

@Guus2005 If you can post the db and at least three files, We can try the import for you.

CJ_London · Sep 14, 2021

the problem is the semi colon rather than a comma. It looks like you need to make a change in the registry - see this link
https://social.msdn.microsoft.com/F...8f-a342-33066ee9dd78/question-of-fmtdelimited. Also perhaps this link to change regional settings.

CSV files change from , to ; with Windows 10

I have just downloaded Windows 10 and run Excel 2010. Excel appears to be working correctly however when I save from excel in the csv. file format, the "," have been replaced with ";". This is a major

answers.microsoft.com

Don't know if either will work

Out of curiosity I tried to use the import wizard to create a new table, append to an existing and create a linked table - all failed.

I changed your example data to commas and it worked without a problem - and no need to specify format (FMT)

Also found this link, but I'm not a member so cannot see the answers

Solved: VBA to import semicolon delimited text files | Experts Exchange

Find answers to VBA to import semicolon delimited text files from the expert community at Experts Exchange

www.experts-exchange.com

and found out that the registry entry has been moved per first link provided - it can now be found in

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Jet 4.0

plenty of examples online about how to edit the registry

Guus2005 · Sep 20, 2021

Here are a few sample input files.
Still can't find some help page about [Text;FMT=Delimited(

;HDR=No;ACCDB=YES;DATABASE=N:\Mijn documenten\].[bestand1#csv]

Any help is welcome!
Thanks!

CJ_London · Sep 20, 2021

I have a solution, but how practical it will be for you, I don't know. It adds a layer of complexity

SQL is

SELECT *
FROM (SELECT * FROM [Text;DSN=Bestand1LinkSpec;HDR=NO;IMEX=2;ACCDB=YES;DATABASE=D:\Dev\1_temp].bestand1.txt) AS txt;

You need to create a link specification, and you need to include HDR=NO because the import spec actually controls that (change it to yes or remove it and you will lose the first row of data.

So the bad news is you need to create a specification. The good news is you can do this in vba.

If all your files are the same spec (one numeric column followed by 3 text columns) you can use the same import specification

If they can vary, you can use vba to open the txt file, inspect the first row of data and build your own specification - the tables you need to look at are
MSysIMEXSpecs

DateDelim	DateFourDigitYear	DateLeadingZeros	DateOrder	DecimalPoint	FieldSeparator	FileType	SpecID	SpecName	SpecType	StartRow	TextDelim	TimeDelim
/	-1	0	0	.	;	850	1	Bestand1LinkSpec	1	1		:

and
MSysIMEXColumns

DataType	FieldName	SpecID	Start	Width
10	FIRST	1	5	10
4	LNR	1	1	4
10	SECND	1	15	12
10	THRD	1	27	11

Suggest you google these to find out more about the columns - I do know the width is based around the largest value in the column - increase SECND for 12 to 255 for example

arnelgp · Sep 21, 2021

test my new class.
run test1 on module1.

Pat Hartman · Sep 21, 2021

Import ONE of the files MANUALLY so you can create an import Spec. As you go through the dialog, you will need to press the advanced button so you can save the spec and give it a rational name as well as specify the delimiter character.

Once the spec is saved, you just reference it by its name.

Don't forget, There are two ways to create a spec. This is the OLD way since there is no easy way to modify the new way and it assumes a specific file name, etc which is NOT what you want. Do NOT check the final save as. That creates the "new" spec.

The fastest way to import over 2000 csv files

AWF VIP

AWF VIP

Immoderate Moderator

AWF VIP

Super Moderator

Super Moderator

Immoderate Moderator

AWF VIP

Super Moderator

Super Moderator

Super Moderator

Super Moderator

..forever waiting... waiting for jellybean!

AWF VIP

Super Moderator

Super Moderator

AWF VIP

Attachments

Super Moderator

..forever waiting... waiting for jellybean!

Attachments

Super Moderator

Attachments

Similar threads

Users who are viewing this thread