Data too large for Access. What next?

kyoch

Registered User.
Local time
, 19:30
Joined
Mar 5, 2014
Messages
58
All,

I have 75GB worth of data that I'd like to add to an Access DB.

Obviously, Access DBs can only be 2GB in size so just having a back end with 75GB worth of data is not an option.

I don't know anything about SQL server or anything related to that.

Does anyone have any examples or suggestions on how to use a database which multiple back ends? Is that the right approach or should I be thinking of another way? Please don't recommend SQL server (see above).
 
how on earth can you have 75Gb of data?

My entire data disk only has 150Gb, and it has multiple databases, most of which have several back ups of both code and data and so on.

what are you trying to manage.
 
Texas oilfield production data for every well in Texas since 1950 :D
 
I downloaded the data from the Texas railroad commission and converted it to 750 excel 97 files.
 
Why are you working with Access? If you are to do "analysis or whatever" with 66 years of detailed info (75 GB), what was your plan for that data when you decided to download/gather it?
Just curious.
 
Even MS-SQL Express has a limit to 10G, I do not know the limits for MySQL and I not sure if they have a free version anymore.
 
I'm using access because it's what I have, I'm famiiliar with the program, and it's easy to build meaningful reports and forms.

If there's something else out there that works then please let me know. I just don't know what's the next steo after Access besides becoming a full blown DBA.
 
Maybe you could put this data in a MYSQL database and link to it through ODBC. I just googled MYSQL size and the limit seems to be 1 TB using NTFS.
 
I reckon you have got lots of text duplicates in there. These could be added to look up tables, in other words reduced to an integer instead of a row of text.

Sent from my SM-G925F using Tapatalk
 
First, Uncle G is absolutely correct. You probably have tremendous amounts of redundancy in that data set. However, whether you attack this with Access or whether you have to do an Access FE and some other flavor of BE, there are some up-front decisions to be made.

The question is whether you actually need to remember every frimpin' little sample of the raw data or whether you are doing a lot of statistical aggregates. If you don't need the raw data to actually reside in the database because you will be doing averages, standard deviations, minima, maxima, and other things that mask the individual items, then leave the files in Excel. (Even if we are talking about yearly or monthly aggregates, this might be viable.)

Be sure that you rename the Excel files (if not already this way) to a specific name plus a 4-digit year number. Be sure that their format is known, predictable, and invariant from file to file. Whether you do this as a macro (unlikely) or as VBA code (recommended), what I would suggest is that you import one spreadsheet to a temporary analysis table. Just ONE. Analyze the heck out of it. Take your aggregates, look for minima, maxima, other statistical aggregates. Copy the parts you absolutely HAVE to have. At the end of that process, erase the contents of your analysis table. Lather, rinse, repeat!

This is a "divide and conquer" method that, if followed carefully - VERY methodically - would allow you to do lots of analysis. It could easily be managed via code. You could also easily store the aggregates in much smaller space.

Part I of your analysis is to use the analysis table to select distinct names for each well, each company, each county or other location designator, so that you have tables that you can use for establishing relations with the analysis table. If you have such things as "quality of crude" such as "sweet" or "sour" crude (based on sulfur content, if I recall that correctly), such gradations could also be linked through your tables. Anything you can do that would change a long descriptive string to an index into a separate table will help you in storing the aggregates as well as the raw data.

Part II of your analysis is, once you have updated the well, company, county, and other tables of identifying information, you run your queries that do group-by aggregates. You can of course do group-by on well, company, county, date, or any other aggregation needed. You might need tables for each different type of aggregation, but that should be possible.

Part III of the analysis is to dispose of the detailed data, which (if you use a temporary analysis table) might be as simple as erasing the table contents.

Another option is to consider that you can easily have up to 2Gb in an Access BE and can map more than one BE at the same time. (You just can't have duplicate table names without doing some things to qualify which BE you were using.) But if you wanted to transfer your Excel to a fixed-format BE file and make copies of the EMPTY BE file first, then you can open the BE file dynamically using VBA, store your data in that one file, and then close the file when done. Then in the future, you could do further analysis stepping through one BE file at a time, plus you can still have open your aggregate BE file and the FE file that contains the queries needed to drive that steaming mess. Yes, it WOULD be a hot mess simply based on the amount of data and the gyrations needed to manage it. But... it could be done. It would be a matter of how much up-front design you are willing to do, how much research you are willing to do on the topic of dynamically opening and closing database files, and how much time you are willing for this kind of analysis to take.

Here's one last hint to consider: No matter HOW carefully you code this, doing numeric analysis on 75 GB of data ain't gonna be fast, pardner. If you have to do these analyses in Access, DO NOT UNDER ANY CIRCUMSTANCES write anything as a VBA-based recordset operation unless there is no other way to do what you need to do. (Reason: VBA is INTERPRETED code, much slower than SQL, which is COMPILED code.)

Good luck!
 
"You probably have tremendous amounts of redundancy....."

And probably the data is not normalized, especially coming from spreadsheets.
 
Kyoch,

Given the responses you have, what do you want to do with the data? That is, what is your ultimate use of this large set of data?
Is pre-processing , or aggregation an alternative or key step in the process?
If we knew more about the "business need", there may be some options.
 
There are lots of options: SQL Server, Oracle, MySQL. 75GB is a small database and any SQL DBMS can cope with that size of data without problem. The basics of databases really aren't so difficult so it's an opportunity to learn something. Or you could hire someone to help you.
 
ButtonMoon is absolutely correct, and if you use MySQL or SQL Server, the ODBC drivers are available through Microsoft. ORACLE, Sybase, and a couple of other vendors will charge you an arm and a leg for their ODBC components, but they work too.
 
SQL Server has a limit too, at about half an Exabyte.;)

Kilo > Mega > Tera > Peta > Exa
 

Users who are viewing this thread

Back
Top Bottom