Microsoft User Voice - I've suggested increasing Max File Size from 2GB

Lightwave · May 21, 2018

Dear All

Just thought I'd try and garner some support!

I've requested that Microsoft think about increasing the default maximum file size for MS Access files on the Microsoft Access Users Voice from 2GB upwards. This is an idea that has been floated twice before and Microsoft has responded saying its not in the plan. Each time it was raised it gained some support and each time it they responded saying no and closed the poll. By my estimate adding up the votes brings it to 6th in the ranking and it may have gained more support if it had remained open.

I think this is pretty much a no brainer. Its possible I am missing something fundamental. I know SQL Express would be a good alternative but as data sizes are getting bigger and bigger. Raising file size would really help with ETL and data analysis projects where you don't have the ability or time to manage the overhead of SQL Server Management Studio and the SQL Engine. For many in work environments getting access to or an install of SQL Express requires lobbying central IT for weeks or months. Of course sensible people will choose that route if available.

So here's the link if you want to check it out. Not sure maybe MS will shut it down but worth a shot.

Link to request larger file size option

For people that don't know Access User Group Voice is a place where the Access team take feedback from users. It is a great place to give some direct feedback to the Access Team and they look and consider all of the ideas. Even if you don't consider this idea to be good I would register and vote or propose ideas you are interested in.

isladogs · May 21, 2018

I've added a supportive comment but my experience of the user voice platform hasn't been positive. For example, requests to handle JSON files in Access got 650+ votes before MS killed it with 'no plans'. Ditto with requests to add features similar to those in Excel Power Query - over 500 votes but dismissed by MS. AFTER 2+ years they still haven't got round to Form fixing border widths in Access 2016.
The few changes they have made recently have not been changes with much popular support e.g support for Salesforce data.

isladogs · May 21, 2018

Pat Hartman said:
I doubt this request will get much traction for two reasons.
1. The 2G limit is actually per table (and associated indexes) since you could link to multiple BE's if you need to. Of course, you loose RI since RI can't be enforced between databases only within a single database. 2G for a single table is huge. I have a number of applications where the row counts run to the millions and the databases are not even close to the 2G limit. Of course, if you are storing OLE objects, you can eat up that space very quickly.
2. At some point, you need to upsize to SQL Server.

Actually Pat I have a 1.6GB Access BE file where almost the entire file size is due to a UK postcodes table containing 2.6 million records and around 50 fields.
There are no OLE fields or attachments or other file bloaters. The data is downloaded from the UK Office for National Statistics. The file is updated every 3 months but isn't edited by end users.

Lightwave · May 22, 2018

Probably could get a 30% reduction but still need to import it initially and then run some procedure on it.

I am working with national address datasets as well and it is causing similar size issues. One of the reason I requested the increase size to be considered. Having a single file that is easily contained which doesn't require virtually any configuration and is completely agnostic of a computers directory structure is a really nice advantage. SQL server express has its place but my datasets that I have to shift about the place are getting bigger and bigger.

Thanks for lending your support Colin , might well not get anywhere but I personally haven't tried before.

Addresses in the UK have been built up over some centuries and there are some odd features relating to hand delivery and manual sorting of paper envelopes that are still relevant to the postal service but of little relevance to everyone else hence some of the complexity. The UK has moved to a Unique Postal Reference Number UPRN but the general public don't use this number and other data providers rarely provide it so you are left matching a real variety of formats to the National Postal Address.

isladogs · May 22, 2018

@Pat

Lightwave beat me to it ....

In fact there are 44 fields of which only 4 could be hived off into separate tables: County, District, Country, Constituency.
I did initially do that but the space saving was much less than 30% and searching such a huge dataset was slower so I decided against it.
For info, all field types have been optimised
a) Text field sizes are limited to the largest value in the dataset e.g. County = 24, Constituency = 43
b) Number fields use the smallest possible type e.g. byte for fields such as Quality (1-9) then failing that integer etc. Latitude & Longitude are Single rather than Double datatype.

Doing that type of optimisation shaved around 240MB from the file size.
Of course, indexing various fields replaced some of that space saving

@Lightwave:
I also use postal address datasets as part of my UK Postal Address Finder app.
These are saved in a separate BE file for obvious reasons
As there are some 28 million UK addresses, that could eventually cause issues but as I obtain these as needed form online sources, I think it will be many years before those cause file size concerns

The_Doc_Man · May 22, 2018

The reason you have a 2 GB limit at all is hardware-model & structural, for the 32-bit version of Access, and probably compatibility for the 64-bit version.

The standard structure of any Windows compiled task (any source language) using the 32-bit addressing models is that you have a section where your memory map pointers are built, then the various code segments, then the various data segments. These segments do not overlap. If you were aware of the "do not execute data" and the "do not modify code" safeguards, you would know that this is managed by setting certain bits in the memory management hardware pointers for each segment. By putting code in a read-only segment and putting data in a no-execute segment, you can more easily thwart hackers who like to use buffer overruns as a way to hack your system. And MS is slowly heading that way. (OpenVMS did this 40+ years ago.)

The structural issue is the split between code and data - but I am NOT talking about the FE and BE splitting. I'm talking about MSACCESS.EXE and the .MDB or .ACCDB files that are the Access project/application. THEY are separate. The EASIEST method for this split, based on historical development of the PC's memory hardware models, is to divide the address space in half. So MSACCESS.EXE gets the half of the address space for which the high-order bit is 0 and the file address space gets the half of the address space for which the high-order bit is 1.

All the .DLL and other library files get "mapped" into the low-address space. The term "dynamic link library" is a reference to the fact that this mapping occurs when the task is launched. And it is dynamic, not static, because there are ways to build such files so that once you establish a "base" address, everything else is relative to the base. So a .DLL can be mapped to address base 0x3AC00000 in this program and the same exact .DLL can be mapped to address base 0x21F80000 in another program.

But what about the data files? Well, really the only issue there is that each data file must have its own separate address base. Open a database file? Establish an address base. Then, internally, all addresses can be relative to the base. Simple enough, right? But maybe not so simple.

Strings (short text and long text data types) use a structure called a descriptor because, based on some reading I did a while back, a string is not stored in the middle of the actual record. It is usually tacked onto the end of that record. The string descriptor is something like 8 bytes fixed size in the "real" record. The address of the string's data is a 32-bit quantity (for which the high-order bit is 1) and is one of the fields of the descriptor. The other descriptor parts are size and data type identifiers because other things (like arrays) have descriptors, too, used for argument passage. That way, if you have a field with capacity 255 bytes but only put, say, 10 bytes in it, the record + associated strings is not fixed length. It is only as along as needed to hold the data.

The problem for Access code is that complied code in the task that touches data in the files now has to step out of its own address space (MSACCESS.EXE has the low half, remember?) to be able to touch it, and the most efficient method involves memory management tricks.

In essence, when you open a database file, you set up memory management pointers into that file. You can have up to 16 such databases open (I believe). When you want to touch something in a file, you have to first MAP that file so that the pointers you have will be in context. AND here is where the limit comes in. Each file gets half the address space of a 32-bit address. A 32-bit address can manage 4 GB, so each half is 2 GB. And the reason you have a limit on the number of databases you have open is just a reflection of the number of how many slots were hard-coded into MSACCESS.EXE's compiled data areas for database mapping information.

As an aside, the reason you lose Referential Integrity when you have multiple large tables spread among multiple large files is that they are in different memory spaces that cannot be simultaneously mapped. You don't have enough address space AND unlike .DLL files, you can't just memory map a .ACCDB file to any address. They ALL start from the same address - 0x80000000 - the first byte of the high half of the address space.

The REASON that MS doesn't want to try to expand Access is that these address referencing mechanisms PERMEATE the code AND the data pointers internal to each file. In essence, MS would have to do a PtrSafe operation for every part of MSACCESS.EXE at a very low compiled code level. They would have to use different descriptors that can accomodate 64-bit addresses. AND in so doing, they would make it impossible for 32-bit and 64-bit versions of Access to use the same data files, because the 32-bit versions would BARF the moment they saw a 64-bit descriptor.

The reason you can get away with address expansion for 64-bit Excel is that the cells of a spreadsheet don't use direct memory address pointers to reference each other. They use row/column designators and each inter-cell reference is dynamically re-evaluated on the fly. Or something like that.

I actually agree that using the 64-bit addressing of modern machines would make for a really GREAT data capacity expansion for Access. But to lose all chance of compatibility with 32-bit systems would ... how shall I say this delicately? PISS OFF 90% of their customer base. MS wants to preserve Office integrity for as long as it can milk that cash cow, and to break Access compatibility with older files is contrary to their business model (of cash-cow milking.)

Lightwave · May 22, 2018

Thanks for the explanation I thought there might be some deeper reason to the limit.

I also think that it would be great. At some point presumably they are going to have to do it. Hopefully the polls on user access forums will give support to those individuals within MS who think its time has eventually come.

Lightwave · May 25, 2018

Doc - someone posted an interesting comment on the poll comments section that you might want to read it is quite interesting and has a different view on the 2GB limit.

It is by Mark Burns

Link again
User Access Poll increase 2GB file size

The_Doc_Man · May 25, 2018

Mark Burns mentioned "pointer space" issues, which is a different wording of what I was saying about address space subdivision and the internal addresses used by strings and linkages. And my comments about address space limits do indeed stem from the memory model that was in force when still using FAT16. Yet another reason why 2 GB was such a hard limit. Also another reason why disk partitioning was originally created, since very large disks of that era were useless unless you broke up the total space into chunks that FAT16 could manage.

In essence, what older versions of Access did was to allow you to treat each 2 GB file as a partition, and they used memory management tricks at the hardware level just like the device driver used address division at the device control register level to get to those extra sectors that were otherwise out of reach.

As evidenced by Excel-64, it is possible to work with larger amounts of data now. But if your address pointers don't support 64-bit addressing, you run into trouble. Witness, for example, the gyrations required for dealing with PtrSafe keyword cases. Now multiply that work by all of the strings, query management code, and internal structural pointers that make JET or ACE workable, not to mention Access itself. Add to the mix that the avowed upgrade path is via SQL Server, not a 64-bit ACE, and you lose all impetus (from the MS business viewpoint) for a structural upgrade.

My viewpoint in my earlier comments was the programming structural view. Mark Burns took the device-driver point of view, with which I do not disagree. It is also a perfectly valid place to consider. Let's just say there are myriad reasons for the size limit and a plethora of SQL-based upgrade options that economically negate any impetus for MS to expand ACE's and MSACCESS.EXE's size abilities.

Microsoft User Voice - I've suggested increasing Max File Size from 2GB

Lightwave

Ad astra

isladogs

Access MVP / VIP

isladogs

Access MVP / VIP

Lightwave

Ad astra

isladogs

Access MVP / VIP

The_Doc_Man

Immoderate Moderator

Lightwave

Ad astra

Lightwave

Ad astra

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread