How to automate importing 31 .csv files into SQL Server?

ions · Sep 20, 2025

BlueSpruce said:
Sounds like a good plan. Virtually everything takes longer to do when dealing with big tables, so optimization is crucial. This data dump you're importing seems like government data for an analytical decision support system. We used to import a 4.5GB PostgreSQL dataset of all U.S. Federal contracts for trend analysis. In a table with a million records, for every alpha character you reduce, you save 1MB. We summarized data by creating aggregate tables. Created lookup tables of certain text values, then replaced the text values in all tables with foreign key integer values joined to the lookup tables.

Keep us informed about your progress, and best wishes for success.

Yes this is US EPA data. I will be analyzing it to perform Hazardous Waste Market analysis for customers.

I find it a little disappointing that the EPA doesn't have simple data integrity such as Natural Key (ManifestNum, LineNum). I found the same lack of data integrity in Canada. I assume it's the same in most departments?

ions · Sep 21, 2025

BlueSpruce said:
I find that strange. What do the specs say about that data?

There are not too many duplicates. I think there were around 10 cases like the below.

ions · Sep 21, 2025

BlueSpruce said:
After you remove the dups you should be able to successfully create the following composite index:

Code:

CREATE UNIQUE CLUSTERED INDEX idxname ON dbo.tablename (ManifestNum, LineNum);

Yes doing that except I always use ID for Clustered. I did find some records that have ManifestNum == null. I have to remove them. I am surprised SQL Server successfully indexed this.

ions · Sep 21, 2025

BlueSpruce said:
I thought both ManifestNum and LineNum columns determined the uniqueness for each row?

Yes ManifestNum and LineNum is the Natural Key for the Details table. My understanding is that the Primary Key is always the Clustered Index? I always use ID as the primary key. I will be using the ID in all the table joins and table relationships.

DickyP · Sep 22, 2025

Gasman said:
You could create one huge file in dos with something along the lines of

Code:

Full file = file1 + file2 + file3 etc I believe?

I know this thread has been solved elsewhere but the suggestion above would take multiple lines of code as the limit on the command line (as far as I can establish - please point me in the right direction if I'm wrong) is still 255 characters

DickyP · Sep 22, 2025

BlueSpruce said:
The Windows command line limit is 8191, however, the Access text field limit is 255.

Cheers - my memory was from long (long) ago and I couldn't seem to find up-to-date info - thanks.

ions · Sep 22, 2025

I downloaded today's manifest data. 35,636 Manifests were added in a week. This entails 176,932,466 Kg of waste. I will likely do the update weekly.

Thank you for all the input and advice.

ions · Sep 23, 2025

Looking for advice for a new post I just created.

Advice on how to improve query efficiency when Filtering on nvarchar(4000)

Hello, I have a table with 31 million records. So far most queries return within 2-4 seconds but there is one important query that is taking 14 seconds. There is a field called FEDERAL_WASTE_CODES and 75% of the time it contains

www.access-programmers.co.uk

Thank you.

Barbora · Nov 19, 2025

That's pretty good time.

Vassago · Nov 24, 2025

I know this is older, but I've used Python and SSIS to accomplish this. Both can work as standalone or combined solutions, depending on how variable you need this to be.

How to automate importing 31 .csv files into SQL Server?

ions

Access User

ions

Access User

ions

Access User

ions

Access User

DickyP

Active member

DickyP

Active member

ions

Access User

ions

Access User

Advice on how to improve query efficiency when Filtering on nvarchar(4000)

Barbora

New member

Vassago

Former Staff Turned AWF Retiree

Similar threads

Users who are viewing this thread