Too much data for Access - what are options? (1 Viewer)

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Hello all,
We're trying to build a new Access database with .dat data that is downloaded from a remote site. The problem is that there is too much data and the 2GB limit size on the database is being exceeded. Can I please get some recommendations on what to do? The data coming in cannot be scrubbed or filtered. There is a field in the data that contains a yyyymm value for each record. We need to keep a rolling 24 month's of data.
Any suggestions are much appreciated.
Thanks.
 

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Thanks. Is that free software? Is it similar to Microsoft SQL Server 2022, which is free and can handle 10GB?
 

plog

Banishment Pending
Local time
Today, 01:08
Joined
May 11, 2011
Messages
11,646
sqlite is free and can handle 10GB. If you are dealing with an SQL database they are are going to function similiarly--by using SQL. All interfaces are going to be different though, but not by much.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 07:08
Joined
Feb 19, 2013
Messages
16,612
Is it similar to Microsoft SQL Server 2022, which is free
to b clear, the express version is free, not the full sql server version
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 02:08
Joined
Feb 19, 2002
Messages
43,275
I use SQL Server Express. It is free. You download SSMS separately. I think SQL Server Express is also limited in size but it it larger than Access.

Depending on the tool you are using to import the data, you might be able to convert the yyyymm to separate fields or to yyyymmdd and use 01 for the day. Otherwise, you can add a column and do the conversion after the data is loaded.
 

jdraw

Super Moderator
Staff member
Local time
Today, 02:08
Joined
Jan 23, 2006
Messages
15,379
sqlite is free and can handle 10GB. If you are dealing with an SQL database they are are going to function similiarly--by using SQL. All interfaces are going to be different though, but not by much.
plog,

Have you used sqlite as backend with Access as frontend? Just curious, I have not seen this type of set up in the forums.
I did find this from devart.
 

plog

Banishment Pending
Local time
Today, 01:08
Joined
May 11, 2011
Messages
11,646
No, but I know it has ODBC drivers to connect to Access. I used it in a raspberry pi project using python. I used SQLiteStudio to work with the database:


The neat thing about SQLlite is that its a standalone file like an Access database. Need a copy--just right click, copy and paste it in windows explorer.
 

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Thank you all for the MS SQL suggestion.
What if I stuck with MS Access and loaded the data by year into different databases. Then link the tables in another pseudo database, and run queries there against all three data databases (e.g., 2021, 2022, 2023). Every month, either add a new "month" table to the appropriate database, or create a new (e.g., 2024) database? Thoughts?
Thanks.
 

isladogs

MVP / VIP
Local time
Today, 07:08
Joined
Jan 14, 2017
Messages
18,221
Bad idea
You cannot enforce referential integrity between tables in different Access databases.
Once you start down that route you will make additional work for yourself and the outcomes will be worse than if you move the BE data to SQL Server or once of the other alternatives mentioned above
 

CJ_London

Super Moderator
Staff member
Local time
Today, 07:08
Joined
Feb 19, 2013
Messages
16,612
usually these data 'dumps' are not normalised - if that is the case with your data you might find normalising the data is a way forward. As an example, a file (.csv) a client downloads is typically around 160k a month. Normalising the data adds around 20k to the size of the access BE - and that includes indexing.
 

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Thanks CJ. Don't know what you mean by normalizing. The data comes in as records separated by LF and CR's. Can you please explain?
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 02:08
Joined
Feb 19, 2002
Messages
43,275
I would stick with the MS family. Use SQL Server Express as long as 10g works for you.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 07:08
Joined
Feb 19, 2013
Messages
16,612
Can you please explain?

lets say your records are to do with sales, so you might have as headings

customerCode, customerName, InvoiceNo, invoiceDate, itemName, Quantity, price, value
abc12345............alphaCo.................12345...........12/9/2023.....gadget1......1..................10......10
abc12345............alphaCo.................12345...........12/9/2023.....gadget2......5..................8.........40
abc12345............alphaCo.................12345...........12/9/2023.....gadget3......2..................15......30

the first two columns are repeating for every row of the invoice, as are the second two

the 8 digit customercode takes 26 bytes whilst the customer name (say average name is 18 chars in length) takes 46 bytes - per row

normalising means you store these two fields, just once, in a customer table with a long primary key which takes four bytes - plus the 62 bytes for the text.

but this means you can replace the first two columns with a 4 byte number to link to the primary key in the customer table.

So simple maths. Save 72 bytes per row (less the 4 byte number) means over three rows you have have saved 204 bytes for a one time 'cost' of 66 bytes. And every time you you make a sale to that customer - you save another 66 bytes because you already have the customer details.

A similar principle would apply for the repeating invoice 'header' detail (number and date) and also for the items
 

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Just to update you, I've managed to load all of the data (24 months worth) in a database. I first had to remove the table indices and then load the data one month at a time. After all months were loaded, I reset the table indices. The database size is 1.9 GB. There is another table that is loaded with demographic data. I had to put this data into a second database. Both of these databases will be back end databases. A third database will link to the back ends. This is where additional tables, queries, etc. will be created for reporting purposes.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 01:08
Joined
Feb 28, 2001
Messages
27,186
CJ's comments on normalization are true and appropriate. However, there comes a moment here where you need to LOOK at the data to decide which columns can reasonably become candidates for normalization. Another equally important issue is whether it pays to normalize or to otherwise reformat. A little bit of up-front skull sweat usually pays off BIG time for big data sets.

For example using CJ's data from post 16 and taking an assumption that your data comes in as all text (because I remember reading about your other post with a big CSV data set)...

Reformat examples first:

If your import is purely CSV and you don't do conversions, you probably get everything as text. Look at CJ's field called "Invoice No." which is a 5-digit number. If you store this as a string, it takes up 5 bytes. If it will ALWAYS be purely numeric, though, you can store it as a 4-byte LONG integer. That's a savings of one byte per record. Not so much, you say? If you have 100,000 records, you just saved 100,000 bytes.

If you have a date in string format, you assume dd/mm/yyyy text format which is 10 bytes. If you convert the string date to a DATE field, you have the same information in 8 bytes (the size of a date field). Then considering 100,000 records, you have saved 200,000 bytes. AND date order works OK sorting date fields, but text fields have to be in yyyy/mm/dd format to sort correctly by date.

Normalization:

Note CJ's customer code and customer name examples. If you have 100,000 records but only 1000 distinct customer codes and customer names, make a customer table that includes an autonumber plus the customer code and customer name. If you make the customer table first, then when you import the main data and you only store the unique autonumber in a record (by looking up the customer code from the customer table), the math says you store 1000 customers x (autonumber, 8-character code, and at least 7 character name) = 19,000 bytes (minimum) in the customer table + 100,000 records x 4 bytes in the main table = 419,000 bytes between the two places. If you DID NOT normalize but instead retained the code and name (at least 15 bytes) x 100,000 that would be 1,500,000 bytes - three times as many as if you normalize. And I'm betting customer name could be more than 7 bytes.

This savings in space utilization is WHY you spend time up front planning how you want to store things and WHY you want to include normalization among the tools in the tool box. That up-front analysis and planning saves your skin more often than you would ever realize.
 

sumdumgai

Registered User.
Local time
Today, 02:08
Joined
Jul 19, 2007
Messages
453
Thank you. Data comes in as a .DAT file, almost 8 mil. records I found after using a free text editor that can accommodate the file size. I'm reading the data into an array, splitting the file into lines and then lines into fields, adding only those records that match a year_month value to the array. The array is then used to populate the table, one month at a time. I do the same for the next year_month, 23 more times. By doing the processing one month at a time, the array does not get big enough to choke Windows.
 

Cotswold

Active member
Local time
Today, 07:08
Joined
Dec 31, 2020
Messages
528
sqlite is free and can handle 10GB. If you are dealing with an SQL database they are are going to function similiarly--by using SQL. All interfaces are going to be different though, but not by much.
SQLite is not a multi-user database. It is however, serverless, self-contained and cross platform. As plog has said it can contain around a dozen gigabytes of data. However, SQLite puts everything into a single file and very large datasets can stress its capability. Many modern filesystems are capable of handling terabytes of data, that doesn't mean they are very good at it. If you need to store several gigabytes or more of data, it might be wise to consider a more performance orientated product.
 
Last edited:

Users who are viewing this thread

Top Bottom