Question about field size (1 Viewer)

gemma-the-husky · Sep 20, 2023

CJ_London said:
it rewrites the record (which is one way they get into a random order) and modifies the index if indexed - some debate whether that impacts the size of the index.

If it does, my question would be if a field is specified as 6 chars and indexed - then the field size is changed to 200 chars (no change to data), does that involve recreating the entire index?

Yes that makes sense. It stores the new version of the original record in a new location, and makes the released space re-usable. So it does that, even if you don't change the maximum field size, but just use more (or even less) of the field? But that must lead to a lot of fragmentation and areas of unused space, hence the need for regular compact and repair.

The_Doc_Man · Sep 20, 2023

Cotswold said:
In the olden days dBASE tables with fields of eight or multiples of eight tended to be faster for indexing or filtering. I have continued to stick to that with key fields being 8 or 16 etc in length. Never using 4, 6 or 12. Plus longer fields if in the analysis, 67 is the longest ever needed for the contents then I'll make it 72 long. I'll have a 40 long field rather than 34. etc. Maybe it doesn't make any difference today but I like to think it does.

Given that we cannot see the "innards" of Access, we have no way to be sure that it doesn't round up for indexes anyway. I know for a fact that ORACLE databases do some rounding up for some things, not only indexes. But damned if I can tell you what Access does in that sense.

gemma-the-husky said:
If you store it as 50, but only use 2, how much space does the record use. So if you use CA for California, but then rename it to California in full, how does the string store the extra 8 characters, unless it has preallocated the entire 50 characters. I don't know, just wondering.

It makes no difference to the record size whether you store it as 2 or as 50 as long as the actual field is 2 bytes long. It WILL make a difference to the index. Access DOES NOT pre-allocate short text fields. It dynamically builds the string to tack it on at the end of the fixed portion of the record.

Suppose I have this record defined by a bunch of fields:

F1 as LONG, F2 as INTEGER, F3 as YES/NO, F4 as DATE, F5 as text(20), F6 as LONG

The actual record would look more or less like this:

Code:

|LONG|INTEGER|YES/NO|DATE|string-size&pointer-to-X|LONG|{X}dynamic-variable-string|
|   4|      2|     1|   8|                      10|   4|variable

In this diagram, the field types and their lengths are shown, and the last LONG field is the logical end of record, but the 2nd-to-last field (F5) is the short-text field. In this layout, it DOES NOT MATTER how long the short-text field was defined to be. The same layout would be true for length 1 and for 255. But the maximum length of the dynamic portion would be constrained by that field size. And the {X] in brackets is zero-length, a place marker.

CJ_London · Sep 20, 2023

@The_Doc_Man - so what would that look like with two strings? And what about the effect on the index for a string?

gemma-the-husky · Sep 20, 2023

@The_Doc_Man - thanks for that. But the area pointed to by the X pointer(s) gets changed/replaced as the length of the required storage changes, and you get a fragmented string storage area that gets defragmented by a C&R operation. Do new records get inserted after the original string area, or does that work differently also?

The_Doc_Man · Sep 20, 2023

Whenever you update a record, the rules say that you COPY the new i.e. updated record to a different place (first), then you diddle with the pointers that link the old record (to remove it), then you link in the new record.

NOTE that it HAS to be this way since in a shared DB, when you open a query, you "lock in" the records. You would put this together by looking up "Order of SQL execution" (a.k.a. "Order of SQL operations"). You do the FROM first, so JET/ACE has to make a list of the records to be involved in the query. So up front, you immediately have a pointer to the record. While YOUR query is active on its next phase (the WHERE clause followed by the SELECT clause), if someone comes along and issues the UPDATE and you didn't invoke the "See Changes" option, both records co-exist. The old copy is in your query's list. Somewhere in the process depending on options, you might get the warning about "another user updated..." The NEW copy of that record is in the table's list of records after the completion of the UPDATE.

The_Doc_Man · Sep 20, 2023

CJ_London said:
@The_Doc_Man - so what would that look like with two strings? And what about the effect on the index for a string?

I will answer that two-parter out of order. I believe that the index pre-allocates the space with fixed size as declared, in a hidden table that is the index with two significant fields - the value of the index and a pointer to the start of the record to which that index refers. However, I am not at all sure about that. It IS possible to do it with sparse allocation, since string comparison rules allow for uneven-length comparison rules. Which would mean that the index ALSO uses short strings. This one, I cannot answer with any certainty.

Regarding the FIRST part of your question: With two strings you would have TWO size/pointer fields, with the first pointer to point {X} and the second pointer to point {Y} which would follow the end of the actual end of the string that started at point {X}. Let's do a simpler record with autonumber (which is a LONG), string 1, string 2, and a date.

Code:

|LONG|string1-size&pointer-to-{X}|string2-size&pointer-to-{Y}|DATE|{X}string1-actual|{Y}string2-actual|

In this diagram as in the other one, the vertical bars are for YOUR VISUAL distinction; they are not in the record, because the contents are picked out using the equivalent of a MID function except that it isn't only for text strings, it is for any byte streams. Neither {X} nor {Y} take up space; they are more visual aids to show you how & where things align. So the fixed portion of the record would be 4 + 10 + 10 + 8 bytes (but no "|" as actual delimiters), and the variable portion's length would be LEN(string1) + LEN(string2) - again with no "|" delimiters. And the actual LEN() values would be in the size indicators of each 10-byte descriptor.

This is my opinion based on lots of reading of various articles over 20+ years of Access work, PLUS reading how other Windows API calls work. Even if I'm wrong, I'll say that it should not be much different from this, because even Microsoft wants SOME consistency in their lives. Stated another way, based on having been involved in computing since 1966 in either student or professional capacity, if I implemented Access, this is close to how it would look based on all of the "best practices" books I have read over that time period. And yes, that is 57 years with computers.[/CODE]

Pat Hartman · Sep 21, 2023

gemma-the-husky said:
It seems to me it must make a difference somewhere if you store a state abbreviation as size 2, or as default size 50

I guess you didn't read my explanation in #12

gemma-the-husky said:
If you store it as 50, but only use 2, how much space does the record use. So if you use CA for California, but then rename it to California in full, how does the string store the extra 8 characters, unless it has preallocated the entire 50 characters. I don't know, just wondering.

This is a very simplistic explanation of how low level I/O works in a file with a random record organization. So this applies to ISAM files as well as the more sophisticated RDBMS'

Records are stored head to toe adjacent to each other. Think of a continuous length of ribbon. When you update a record that was previously 283 characters long and make it 305, you are correct, Access (or any other RDBMS) cannot write the new record back on top of the old record. Every RDBMS works somewhat differently in how it actually manages "free space" and how it distributes it throughout the space allocated to the file but most include certain amounts of it in each physical record to enable it to more efficiently handle expansion. The size of a physical record is dictated by the size of the hard drive. They are optimized for I/O so rotational speed is also a factor in the size. Individual records are not read/written. I/O is performed at the physical record level. That is why updating a single record can actually lock multiple logical records while the update is taking place. So, if there is sufficient "free space" in the physical record, the database engine, rewrites the entire physical record so the records retain their original sequence. If there is no "local" "free space", then the database engine replaces the record with a pointer to a new location. It then writes the replacement record at the new location.

CJ_London said:
If it does, my question would be if a field is specified as 6 chars and indexed - then the field size is changed to 200 chars (no change to data), does that involve recreating the entire index?

I also addressed this in #12. So, if you change the max field size from 6 to 200, the entire index must be recreated perforce.

The_Doc_Man · Sep 21, 2023

To amplify Pat's comments about optimization of record placement, I know this for an absolute fact: In older ORACLE databases for mainframes, they have a thing called a "bucket" which holds several disk blocks. The disk's physical geometry becomes a consideration in optimizing layout because they want the buckets to never cross a cylinder or track boundary. So if you have, say, 24 blocks per track/cylinder combo, then bucket sizes can be 4, 6, 8, or 12. (They didn't, at the time, like buckets bigger than 12 blocks.) The point of this is that they work on the contents of a whole bucket at a time. When storing records initially, they never let a record cross a bucket boundary. Which means some limited slack space at the end of the bucket. Later, during updating if they could rewrite the bucket with a longer record and still not cross into another bucket, they would do that. But if the update exhausts the slack space, they have to do what was called a bucket split to accommodate the longer record by making one bucket become two buckets.

So Pat's comments about how things are stored in buckets is absolutely SPOT ON for the case of older ORACLE databases. I have to admit I have lost track of the newer ones, and of course for an SSD, device geometry has no meaning. But there were times in the "good old days" where folks even wrote things in a specific scatter pattern that was designed to optimize both rotational and seek-arm speeds.

Pat Hartman · Sep 21, 2023

The other thing that optimizes for rotational delay is that assuming each "track" has room for 8 physical records, they are written as 1,5,2,6,3,7,4,8 so that by the time the head reads one physical record it is positioned to read the next one in sequence. The same works to optimize the in-out head movement as the heads move from track to track.

gemma-the-husky · Sep 21, 2023

I got interrupted.
I just wanted to thank everyone above who presented helpful information about some inner workings of access. Very helpful in general.

CJ_London · Sep 22, 2023

Many thanks for everyone's comments

Had a few minutes so thought I would do a quick test to see how the db size changed based on different setups

Created a new db, added a table with a single 6 char text field, unindexed and compacted - 356kb
added 1m random 6 character random text records 19,068kb
compact/repaired 19,056kb
added a non unique index - took about a second - 30,688kb
compact/repaired 30.684kb
change field size from 6 to 200 - took several seconds - 54,740kb
compact/repaired 30,684kb
removed index 30688kb
compact/repaired 19052 kb
changed field size from 200 to 100 and added back the index - took less than a second - 43,112kb
compact/repaired 30.684kb

Summary
file sizes at steps 3 and 5 indicate records are added consecutively as would be expected, so not much compacting to do

step 3 - simple maths based on the 18,700 file size increase indicates a 6 character string takes 18 or 19 bytes. Rule is (I think) 10bytes +2 bytes per char=22 bytes - maths is pretty basic but file size increase is broadly in line with expectation

step 5 - increase in size from step 3 to step 5 is an increase of 11,628 implying the index takes less space than the actual value even tho' a pointer is also required. Maths implies only the characters are stored in the index, not the 10 byte overhead.

step 6 - confirms the comments about the index being rebuilt if the field size is changed (increase in size 24,056)

step 7 - demonstrates indexes would appear to take up the same space regardless of the field size

step 9 - restored to original non-indexed state - same size as for 6 character limit - to be expected

step 10 - increase in size 12428 - approx 1/2 of the increase for a 200 char index - to be expected.

step 11 - back to the same situation at step 7

Conclusions

If you change the field size on an indexed field, do a compact/repair to recover the space
Indexes take up space related to the field value, not the specified size
Indexes (in respect of short text) take up less space that the field value, (other datatypes not tested)

Not tested
This test in based on 1m records each with a 6 char value. Not tested to see if the index is adjusted to the max actual field length when there are varying field lengths. This thought comes because the creation of the 100 char index was significantly faster than when the 200 char index was created. I might have expected around 50% of the time, not 5 or 10%. Alternative speculation is there is a time overhead on first creation of an index

Pat Hartman · Sep 23, 2023

Do all the rows have values for the indexed fields? Remember, the field value is a Row in an index rather than a column and if there is no data in the column, there would be no row in the table.

Pat Hartman · Sep 23, 2023

gemma-the-husky said:
So it does that, even if you don't change the maximum field size, but just use more (or even less) of the field?

No, the record is rewritten in the same spot if it is the same size or smaller. Only if it is bigger, is it written to the end of the tab.e.

This is easier to see with SQL Server because for Jet/ACE whenever you compact the database (BE or wherever the tables are), Each table is sorted and rewritten in PK order. The indexes are also recreated. Also, if you've ever opened a table in DS view and sorted it, Access remembers the sort so to perform a test, you need to start with a new table. You can append the data in PK order by sorting on the PK and appending the rows from a large table.

Remember - do not ever sort the table in DS view or the test won't work.

Once you have a table populated with a large number of records (100,000), modify a couple of early records by making a couple of data values significantly longer than they were before. Say you define address as 100 characters. Most addresses are < 35 characters so make some addresses 80+ in length. Close the database (to clear the cache) - DO NOT COMPACT. Then reopen the database and open the table. Are the records you modified "missing". If they are, they have been rewritten at the end of the table or wherever in the file is the first "free space" large enough to hold the increased record size.

You can always see this with SQL Server. It is just a little harder with Jet/ACE.

Use a query to sort the records and you will see the "missing" rows pop back into place.

The_Doc_Man · Sep 23, 2023

Pat Hartman said:
No, the record is rewritten in the same spot if it is the same size or smaller. Only if it is bigger, is it written to the end of the tab.e.

This is not true for at least some cases, and for JET/ACE I think it is never true. I don't think it is EVER true in a shared-access case for ANY database that is ANSI SQL 92 (or later) compliant.

If you consider the case of a shared JET/ACE BE file, let user1 runs a SELECT query to include record XYZ; then let user2 come in a couple of seconds later and UPDATE record XYZ. The rules as I understand them say that user1 has a snapshot-in-time of the old record XYZ whle the SELECT is still open, whereas user2 has created AND RELEASED a new version. If user1 did the SELECT with a "No locks" option on the query, that can happen and BOTH records must co-exist.

At least in this case, the updated record HAS to be written at the end of the table no matter WHAT size it is because the old record is still in use. And if user1 doesn't attempt to do an update through the SELECT query before the automatic requery time found in the File>>Options>>Current Database section but instead just releases the SELECT (after the UPDATE is complete), user1 might never know that the record was updated while he/she was busy. But for that brief time of overlap, both records have to exist. And the one that sticks around after the closure of the SELECT query is the updated record at the end of the table, which was the only open place for it to go.

Pat Hartman · Sep 23, 2023

The_Doc_Man said:
This is not true for at least some cases, and for JET/ACE I think it is never true.

Doc, if the record were not rewritten to the same spot, the table would always be out of sequence. As it is, you can force the records to go out of sequence by using a big enough table and making the right kind of changes to a few "early" records. Technically it doesn't matter what records you change but if you have a new table and the IDs are all in order 1,2,3,4,5, etc. The rows will always appear in this order - DO not Compact. Do NOT sort the table. Doing either of these things breaks the test since they physically (in the case of C&R) or logically (in the case of sorting the table - but Access "remember" the sort so you can't ever get rid of it). if you change 3 so that it is too long to be rewritten in place, when you open the table it will show 1,2,4,5, ......., 3 . Record 3 will get rewritten at the end of the table. Make changes to other records but do NOT increase their length. They won't move. This test, if you perform it correctly with sufficiently large tables, will prove that I am correct. The tables need to be large enough so that they scan multiple physical records. So, just use a large table and do what I suggested earlier to copy it to a new table which has NEVER been sorted.

The type of lock will affect which records can or cannot be updated and when but have nothing to do with the how. If every record were always rewritten at the end of the table, the table will be constantly scrambled and that just doesn't happen.

The_Doc_Man · Sep 23, 2023

Pat Hartman said:
Doc, if the record were not rewritten to the same spot, the table would always be out of sequence.

Yes. It is.

Pat Hartman said:
the table will be constantly scrambled and that just doesn't happen.

I believe we have to agree to disagree.

GK in the UK · Sep 23, 2023

Can't we just test it with a Hex Editor on the Access file and find out? I tried it with XVI32 on a new monolithic Access DB, and changing the value of a 3-character string in a table resulted in the old value not getting a hit at all, and the new value appearing in the file in the identical decimal position.

I tested that 4 times with consistent results, then changed the 3-character string to 4 characters. Sure enough, the old (latest revision) 3-char value was still a hit at the old decimal position and the new value was a hit in a new position. tested that twice, resulting in the first 4-char test being overwritten, then changed the 4-character to a 5-character, then to a 6.

At that point, the Access file contained the latest 3-character value in it's first position, the second test 4-character value in the position of the first 4-character value, the 5-character test still searchable, and a new position for a 6-character value. The actual table row of course shows only the latest 6-char value. I can't say if the latest enlarged revision is at the 'end' of the actual table - I'd assume it must be - but there's so much gobbledook that appears in the hex editor window I can't tell.

I didn't test a shorter string. My quick and dirty test was with an AutoNumber PK and a single non-indexed column defined as 64 characters. I put 100,000 otherwise identical rows in and tested changes on a single row, number 50,000 every time. Access and the Hex Editor closed/re-opened fresh for each test.

Pat Hartman · Sep 23, 2023

GK in the UK said:
I tested that 4 times with consistent results, then changed the 3-character string to 4 characters.

I guess you didn't understand the concept of embedded "free space". I told you specifically that to see the record move,
1. you needed a database large enough to span several physical records.
2. you needed to increase the size of the record sufficiently such that there would NOT be enough local "free space" to rewrite the record into the same block.

GK in the UK said:
I tested that 4 times with consistent results, then changed the 3-character string to 4 characters. Sure enough, the old (latest revision) 3-char value was still a hit at the old decimal position and the new value was a hit in a new position. tested that twice, resulting in the first 4-char test being overwritten, then changed the 4-character to a 5-character, then to a 6.

But did you look at the subsequent records? Their start addresses would have changed. You didn't make a significant enough change in the record to force the database engine to relocate it to the end of the table. Clearly there were at least 6 characters of "free space" in the physical record your rows were contained in. You need to EXCEED the available "free space" to see the record move. But what your test did confirm is that the records are not arbitrarily moved to the end of the table just because they are changed.

I can't tell what your physical record size is so I can't give you a good number of how many records you have to change or how many characters you have to add.

Here's a concrete example that will help you.
1. the physical record is 2,000 bytes (we don't know what this actually is so the number is totally made up. It is more likely much larger, say 25,000)
2. the logical record is 135 (that's the maximum length of any row).
3. 135 * 14 = 1890 which leaves a minimum of 110 characters of "free space" in the physical record. But there might be a lot more depending on the size of all the actual records. You didn't even come close to that in your testing.

gemma-the-husky · Sep 24, 2023

I imagine if a changed record was written in place you would see the data change. If you write the record at the end of the current file, then the original record might not be cleared. It would (or could) just be left in position with the original data, but there would be no pointer to it. Maybe the space couldn't actually get reused until after a compact operation.

The_Doc_Man · Sep 24, 2023

gemma-the-husky said:
I imagine if a changed record was written in place you would see the data change. If you write the record at the end of the current file, then the original record might not be cleared. It would (or could) just be left in position with the original data, but there would be no pointer to it. Maybe the space couldn't actually get reused until after a compact operation.

Dave, I totally agree with that viewpoint. I think you have correctly described what actually goes on. The problem that many of us have is that different engines do different things. I know that ORACLE handles updates differently from JET/ACE. In the ORACLE case, I believe that Pat's view is correct because ORACLE explicitly includes a "slack space" parameter in their installation options and when creating a new DBF container file. With slack space, you CAN overwrite a record during an update because you know how much room for overlap that you have. But Access doesn't, to my knowledge, have slack space. It takes the "sparse" approach i.e. store only the bare minimum to get the job done. Particularly because of the way that Access limits DB space to 2 GB, allowing slack space would be contraproductive.

Question about field size (1 Viewer)

Super Moderator

Immoderate Moderator

Super Moderator

Super Moderator

Immoderate Moderator

Immoderate Moderator

Super Moderator

Immoderate Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Immoderate Moderator

Super Moderator

Immoderate Moderator

Registered User.

Super Moderator

Super Moderator

Immoderate Moderator

Similar threads

Users who are viewing this thread