I need assistance pulling a big db into SQL Server (1 Viewer)

jwcolby54

Member
Local time
Today, 08:23
Joined
May 19, 2025
Messages
256
I downloaded the National Address Database last night. Unzipped it is 33 gb and change which is a tad too big for MS Access <grin> or even SQL Server Light. They provide a schema.ini with the field names and widths which seems like it could be used directly in the import process. But it has been to long since I did a bulk copy input and I thought I'd ask for assistance before I began the struggle.

I own SQL Server but my big databases are on a desktop machine which I don't use any more. I am working on an 8 core amd laptop with 64gb SSD, with plenty of room on an ssd. This is my dev machine now and I would like to get this address database loaded into sql server on this laptop. It is more than I actually want but it is free and it has a TON of data for testing purposes.

BTW the last time I did this the fileds were fixed width with space padding to the right. I have no idea whether that will be the case here but if so I need to get rid of the padding as well. AFAICT this is a single big table.

Can anyone help me with this. Thanks
 
Well maybe no help needed. The import wizard is working. Thank goodness for the ini file. I had to manually modify all the lengths to match the ini. Then I had to find and fix "does not allow null" errors for a handful of fields. The import is now running. It will take awhile I suspect.
 
Possibly a daft question, but what format is the data in that you have downloaded?
 
Possibly a daft question, but what format is the data in that you have downloaded?
It claims to be a csv although the file extension is txt. It has a header row, and after that the fields are comma delimited, so likely a csv. It was running but now got a timeout error. This is a big file, which means a big import process. How do I modify the timeout period? I am working with chatgpt but a real human is always (usually) better

It took awhile to modify all the field widths. Is there a way to save this data from the wizard if I have to leave the wizard to do something?

BTW I just matched the .ini file for the field widths. That seems to be working. It is finding fields that require 'allow nulls" which the wizard did not pick up by itself. Probably a half dozen or more.
 
It seems that I was working with SQL Server light and ran into the max size limit. I am downloading and installing the developer version. I'll see if I can get it done there.
 
I'm here to say that ChatGPT is an awesome tool. It worked with me to get the disk files moved out to a larger disk.

On another front... I have a registration problem with Word trying to hook the preview in explorer and failing with an obnoxious popup that cannot be gotten rid of except with task manager. ChatGPT walked me through to the point that I now understand that much. I can preview many things, but .odt files... Word kicks in as the preview handler (and fails) and what a PITA that is.

We have also looked at a bunch of stuff including deep into the registry where sure enough Word is doing the preview for .odt files. It hooked me up with AutoTRuns and how to search through stuff in there (nothing found there). I have to say I am amazed the depth of knowledge.
 
Well this has been a project. I am looking for a reasonably large test data table, something I could use to create a demo for dependent objects filtering, using a combo for filtering some table, then using that combo to filter some other table etc. At least two but preferably several levels deep. So says I, a large address database could allow me to treat a single table as if it were several. As an example, I could setup a listcontrol to filter for one or several states. Once I select those states I could set up the next to filter for last names. How many Colbys live in Texas. In California? In Arizona and New Mexico.and Utah. OK now what first names do those Colbys have.

Something like that. I really did not want to spend a bunch of time building a bunch of tables, clients, orders, line items or some such.

Soooo... I found this thing called the National Address Database. Free, maintained by the government, literally something like 60 million addresses. Cool, says I, let's grab that. Well yea! It requires SQL Server of course. And it don't fit in SQL Server Express. So I had to download the dev version, which I did.

So one would think a bulk copy import would be the ticket. One would be wrong!!! First off it is a csv, comma delimited. Whoever created thing thing created just a ton of fields where the field contained data like this "Something, Something Else, Something other". Bulk copy can't deal with the quotes with comas inside. The comma is the field delimiter and so it tries to break that string out into data for three fields.

I did get manage to get 4 million records in just using the BCP and ignoring errors but it died well before the end with a million errors. Even that I had to do 10,000 record updates all along the way. And holy cow, it was dog slow. I am doing this on my 8 core 16 thread 64gb ram huge ssd laptop. Plenty of horsepower.

So being the perfectionist that I am...

So I am trying to use SSIS ( I think it is called) which can deal with the quotes and embedded commas to correctly place that data into a single field.

You can see where I am going with this. What should be simple ain't. I got ssis to show me the thing, accept the first line as field info. etc but now it wants to use a standard 50 character string per field ans some fields have 255 characters.. There is an ini with field defs but it can't just use an ini.

So... I have spent a bunch of time trying to import a large table into SQL Server. SQL Server is a powerful beast but my forte is Access and event programming in vba.
 
I'm here to say that ChatGPT is an awesome tool. It worked with me to get the disk files moved out to a larger disk.

On another front... I have a registration problem with Word trying to hook the preview in explorer and failing with an obnoxious popup that cannot be gotten rid of except with task manager. ChatGPT walked me through to the point that I now understand that much. I can preview many things, but .odt files... Word kicks in as the preview handler (and fails) and what a PITA that is.

We have also looked at a bunch of stuff including deep into the registry where sure enough Word is doing the preview for .odt files. It hooked me up with AutoTRuns and how to search through stuff in there (nothing found there). I have to say I am amazed the depth of knowledge.
I prefer Claude AI over ChatGPT, primarily because it has a personality more compatible with mine. I know that sounds odd, but it's true. The way the responses are phrased is more to my liking, even though they may contain the same material.

In another thread here I've used the phrase "Assisted Intelligence" as opposed to "Artificial Intelligence" when talking about how I have come to think of LLMs' role in assisting developers be more productive more quickly.
 
No need to regsiter with ChatGPT unless you want to? That Claude AI appears to insist on that?
 
No need to regsiter with ChatGPT unless you want to? That Claude AI appears to insist on that?
Claudai also has a small number of prompts allowed. 10-30 according to their faq. ChatGPT has a limit as well but it simply switches to a different version when you hit that limit. You can continue to ask stuff.
 
TBH, I only go there to supply code for people to lazy to do it themselves. Previously I would use Google and supply a link for the same reason.
I cannot write much code off the top of my head, I have to look up the syntax etc, so chatGPT comes in handy. :)
Presumably I have not hit my limit yet. :)
 
Claudai also has a small number of prompts allowed. 10-30 according to their faq. ChatGPT has a limit as well but it simply switches to a different version when you hit that limit. You can continue to ask stuff.
True, it has a limit. Rarely do I need that many, but a time or two I've exceeded it and had to wait a few hours to resume. Again, it's more of a comfort fit than a practicality thing. I just like the way Claude AI responds.
 
No need to regsiter with ChatGPT unless you want to? That Claude AI appears to insist on that?
Yes, that's a negative in one sense. On the plus side, I can pick up a thread days later, just like talking to an old Army buddy after 20 years 😉. I imagine ChatGPT does that as well, but I more or less shifted focus a while ago so I haven't experienced it in ChatGPT. Like everything else, it's good to have more than one option available. Can you imagine having to get all your AI needs from MyGovAI.gov?
 
TBH, I only go there to supply code for people to lazy to do it themselves. Previously I would use Google and supply a link for the same reason.
I cannot write much code off the top of my head, I have to look up the syntax etc, so chatGPT comes in handy. :)
Presumably I have not hit my limit yet. :)
And I am doing the same thing, though in my case to push into the book. I have a TON of old code which I start with, but stuff like that export / edit / import was pure ChatGPT sourced, though I did edit it to get in a format I liked. And I learned a ton in the process.
 
Yes, that's a negative in one sense. On the plus side, I can pick up a thread days later, just like talking to an old Army buddy after 20 years 😉. I imagine ChatGPT does that as well, but I more or less shifted focus a while ago so I haven't experienced it in ChatGPT. Like everything else, it's good to have more than one option available. Can you imagine having to get all your AI needs from MyGovAI.gov?
Do you pay the monthly? I assume not from your response.

And yes, ChatGPT can have a memory of old stuff. TBH AI isn't all that. ChatGPT knows that BCP can't handle the commas embedded in quotes, and yet it keeps suggesting that even though we have gone down that road before. So I have to be the adult in the room. Hard to do in my case. <Grin>
 
Do you pay the monthly? I assume not from your response.

And yes, ChatGPT can have a memory of old stuff. TBH AI isn't all that. ChatGPT knows that BCP can't handle the commas embedded in quotes, and yet it keeps suggesting that even though we have gone down that road before. So I have to be the adult in the room. Hard to do in my case. <Grin>
No, I use the free versions of most software when I can. I briefly considered it this morning around 3:00 AM when I woke up having a bad dream about a free version AI agent giving me the run-around. But them I had coffee and calmed down. It turned out that the problem I had been trying to resolve was a corruption in the cached meta data in the PowerApps app, nothing to do with the Stored Procedure itself.

Now that I think about it, I came up with that myself after Claude struck out.
 
I did get manage to get 4 million records in just using the BCP and ignoring errors but it died well before the end with a million errors. Even that I had to do 10,000 record updates all along the way. And holy cow, it was dog slow. I am doing this on my 8 core 16 thread 64gb ram huge ssd laptop. Plenty of horsepower.
My guess is that you did something wrong with the BCP settings. You can set the number of records per batch higher or lower. (Higher will speed it up, but it'll require more memory.) Did you read up on all the switches?
 
My guess is that you did something wrong with the BCP settings. You can set the number of records per batch higher or lower. (Higher will speed it up, but it'll require more memory.) Did you read up on all the switches?
I know for a fact that the data heavily used "" around commas in long strings. From my reading BCP does not have a switch to tell it to ignore that.

I managed to pull in about 3.6 million records by telling BCP to use a batch of 10K (write every 10K) and ignore a million errors. It was ignoring all of the fields with comma delimited interiors. A million errors and managed to bring in 3.6 million records. Then it quit.

And no I did not read up on all the switches. I got results but BCP was not the tool for the type of data I was dealing with.
 
I downloaded the National Address Database last night. Unzipped it is 33 gb and change which is a tad too big for MS Access <grin> or even SQL Server Light. They provide a schema.ini with the field names and widths which seems like it could be used directly in the import process. But it has been to long since I did a bulk copy input and I thought I'd ask for assistance before I began the struggle.

I own SQL Server but my big databases are on a desktop machine which I don't use any more. I am working on an 8 core amd laptop with 64gb SSD, with plenty of room on an ssd. This is my dev machine now and I would like to get this address database loaded into sql server on this laptop. It is more than I actually want but it is free and it has a TON of data for testing purposes.

BTW the last time I did this the fileds were fixed width with space padding to the right. I have no idea whether that will be the case here but if so I need to get rid of the padding as well. AFAICT this is a single big table.

Can anyone help me with this. Thanks
Do you have the link for the file so we can try to import it?
 

Users who are viewing this thread

Back
Top Bottom