Table Design Help

BlueJacket · Feb 13, 2017

Trying to figure out how to break up a table I've imported from Excel. The table has 32 fields and 85%-90% of the fields are filled in with data. Here is the table broken down by field:

tblPropertyDetails
PropertyID
Client
Tag
DateAquired
County
ParcelID
ParcelLink
PreviousOwner
PhotoLink *
StreetNumber
StreetName
ParcelMap
City
ZipCode
GoogleMapLink
Zoning (residential, commercial, agriculture, etc)
ResidentialUse (single fam, multi-fam, condo, townhouse, etc)
ImprovedUnimproved
Acreage
LandValue
ImprovedValue
AccessoryValue *
TaxAssessorFMV
ZillowLink *
TruliaLink *
Bed
Bath
SquareFeet
YearBuilt
LastSaleDate
LastSaleAmt
Notes *

* Means there are a good amount of blank records.

Save for those fields marked by an "*", pretty much all of that data is filled in. My thought process is this: I can leave the table as it is for the most part and just create "tblNotes," "tblAccessoryValue," "tblZillowLink", etc. Pretty much just separating out the fields that have a lot of blank records and link them through the "PropertyID" PK, though this would still leave me with roughly 28 fields left in the table (how many fields is too many?).

OR

Still have a main "Property" table listing all the properties with just the basic information, then create multiple other tables to fill in all the other information. Like "tblLinks," "tblTaxAssessorValue," "tblImprovedInfo," "tblLandInfo," etc. However, while this will create many different tables with much fewer fields, there will be slightly more blank records.

OR

I can normalize the crap out of it and create a ton of 2-3 field tables to link back to the main "Property" table so that there are hardly any blank records anywhere. This is what I'm leaning towards, but it will create more work each time we import a new bulk of properties from an Excel file.

Thank you in advance for your help. This forum has been immensely helpful to me and continuously expands what I know about Access (which isn't much).

CJ_London · Feb 13, 2017

normalisation is not about 'saving space' by having low populated fields in a separate table, it is about grouping data together in a way that does not require the duplication of data. The way to think of it is in terms of relationships. e.g.

'Each property can only have one client' - client fields stay in the table
'Each property can have many clients' - client fields go into a client table and replaced with a single 'clientID' field in the property table.
'Each property can have many clients and each client can have many properties' - client fields go into a client table and a separate link table is created between clients and properties

However that is not to say you wouldn't have a separate client table 'lookup' that users reference via a combobox in order to ensure consistent spelling.

Without knowing what your fields mean, difficult to say - notes is perhaps one that would go into a separate table, but only if you expect multiple notes.

Perhaps landvalue goes into a separate table with a date of valuation because the land is revalued on a regular basis and you want to keep a history of changing values

BlueJacket · Feb 14, 2017

So if there's just one "AssessoryValue" data point per property, then it's not worth it to place that in a separate table? Even if something like 70-80% of those records are blank? Is it fine to have a table with 30 (what about 60?) different fields, as long as all of those fields have a one-to-one relationship with the primary key?

Even though the practice is not normalization, is it not good practice to create separate tables in order to free up space? Even if those records are blank, isn't memory still allocated to them?

CJ_London · Feb 14, 2017

So if there's just one "AssessoryValue" data point per property, then it's not worth it to place that in a separate table?

no, not unless you want to maintain a history of values, or record values for different accessories.

Even if something like 70-80% of those records are blank?

correct - blank values do not take up space.

is it not good practice to create separate tables in order to free up space? Even if those records are blank, isn't memory still allocated to them?

memory - depends on which fields you bring back, the more you bring back, the more memory required. disk - modern db's do not necessarily reserve space for text to maintain a contiguous record (the only field type that has variable space requirements)

see this link for data sizes

http://www.w3schools.com/sql/sql_datatypes.asp

and don't forget that if you have two tables, you need indexes as well

Is it fine to have a table with 30 (what about 60?) different fields, as long as all of those fields have a one-to-one relationship with the primary key?

yes - maximum allowed is 255 fields. One type of possible exception for maintaining one to one relationship tables is perhaps where you have a 'person record' and maintain completely different sets of data depending on whether that person is a child (so keeping say education data), an adult, (keep employment data), retired or deceased.

Good database design is more about performance rather than saving space - space is cheap.

See this link right at the bottom

https://support.office.com/en-us/article/Set-the-field-size-ba65e5a7-2e6f-4737-8e72-36b93f966a33

Note: For data in a Text field, Access does not reserve space beyond what is necessary to hold actual values. The Field Size property is the maximum field value size.

BlueJacket · Feb 14, 2017

I see. That makes sense. I appreciate you clearing some things up for me. I've been doing a lot of reading and video watching and without someone to help clarify some things with, it's easy to misinterpret.

Another follow up for you then. My company deals with investments and these investments mature through three different stages. For simplicity, we'll call them Stage 1, Stage 2, and Stage 3. The stages have to happen in order, but an investment can stop at Stage 1 and not progress any further; the same with Stage 2.

From an organizational standpoint, should each stage have its own table, even though those tables would have a one-to-one relationship? Or should I keep them all in the same table because they would all have a one-to-one relationship, using my forms and queries to organize the data as I need it?

CJ_London · Feb 14, 2017

should I keep them all in the same table because they would all have a one-to-one relationship, using my forms and queries to organize the data as I need it?

the point here is you are looking at it from the wrong perspective. It is only a one to one relationship if a property can only have one stage - even if that stage can be 1, 2 or 3. You are saying there are three stages, but havent said what the difference is or what you need to store regarding each stage. So you can argue it is a one to many relationship - one property has many stages.

From an organizational standpoint, should each stage have its own table, even though those tables would have a one-to-one relationship?

It depends on what that means - if stage is just a flag then I would say no, just a field to indicate the current stage. If you need to record additional information such as a date so you can see how long ago the stage was reached, who's dealing with it etc then put the stage data in a separate table.

A sign of an unnormalised table is one where you typically see field names like stage1, date1, stage2, date2, or month1, month2, month3 - the sort of thing you see in excel which is typically as unnormalised as you can get.

BlueJacket · Feb 14, 2017

CJ_London said:
It depends on what that means - if stage is just a flag then I would say no, just a field to indicate the current stage. If you need to record additional information such as a date so you can see how long ago the stage was reached, who's dealing with it etc then put the stage data in a separate table.

Ok, good, so this is how I have it set up currently, as each stage has its own paperwork, dates, and other benchmarks it needs to hit before proceeding.

The_Doc_Man · Feb 14, 2017

CJ is giving you excellent advice. Sometimes hearing things from another viewpoint is important. Your worry is normalization and when to split a table. Let me throw in a second and probably different view so that you can perhaps get a better handle on things.

The question is, when do you split tables?

CJ answered that straight up. I'll tell you a mechanical way of deciding. Suppose that your three stages have dates as well as data specific to the stages even though each is related to the overall property. Ideally this should be split. Why?

Because the rule for normalizing a table is to keep things together that ONLY depend on the prime key and nothing else. When you add data to the main table for stage 1, stage 2, etc. then the added data depends on TWO things - the property ID and the date at which the stage began. And because you have data that no longer ONLY depends on the property ID, it can't stay in the main table any more.

Here is another technical view of how much space is taken up by empty or zero fields. A numeric field of size X always takes up size X, whether we are talking 1 bytes, 2 bytes, 4 bytes, 8 bytes (e.g. QUAD or DOUBLE numeric). A yes/no field is a typecast of a byte and is always the same size. A date is a typecast of a DOUBLE and is always the same size. Currency is a typecast of an OCTAWORD (though a "raw" OCTAWORD isn't used) and is fixed in size.

That leaves text fields. All text fields take up the same minimum amount of space, which isn't zero length. It is a LONG pointer to the actual string, and if the string is empty, the pointer is empty. If the string is not empty, the pointer is still there and the string is ALSO there. So in terms of database space, an empty string takes up a small amount as a pointer that may point to "nothing." If you want to get precise on the measurement, it IS possible to build a database, do a fussy size measurement, erase some strings, run a Compact & Repair, and do a second fussy measurement and see how big/small the database gets. I remember doing that once, but I also read an article somewhere about how strings worked. Damned if I can recall it now, but in essence it revealed that all strings have placeholders. The placeholder is not zero length, but if the string IS zero, all you have IS the placeholder.

BlueJacket · Feb 15, 2017

The_Doc_Man said:
Because the rule for normalizing a table is to keep things together that ONLY depend on the prime key and nothing else. When you add data to the main table for stage 1, stage 2, etc. then the added data depends on TWO things - the property ID and the date at which the stage began. And because you have data that no longer ONLY depends on the property ID, it can't stay in the main table any more.

Ok, so I think it's this concept that I'm having trouble wrapping my mind around.

I think I need to go into more detail (I was trying to keep things simple, but that seems to be a mistake) in order to give a better example of how my tables are set up, so that you may see my thought process...

We are a law firm and investment company with many clients. Our investments mature over the course of three stages. The first stage requires research and acquiring the piece of land, then waiting a long period of time as required by law. The second stage requires just a little legal work contacting all interested parties and then a little more waiting. The third stage requires lots of legal work and then selling the property.

Here's an example of how I have my tables split up.

tblPropertyDetails
PropertyID (PK)
other fields referenced in OP, just lots of info about the property

tblStage2
PropertyID (PK)
RipenDate This marks the start of Stage 2
PaperworkSentDate
ResponseDeadlineDate This date marks the end of Stage 2

tblRedeemed *
PropertyID (PK)
DateRedeemed
AmountReceived
RecordedDate

tblStage3
PropertyID (FK)
CaseID (PK)
Attorny
JudgeID
QTFileDate
PubMotionFileDate
PubOrderFileDate
MotionDefaultFileDate
FinalOrderFileDate
FinalOrderRecordedDate

tblSold Could argue this is a 4th stage
PropertyID (PK)
SalePrice
SoldDate

*A property can be "redeemed" or bought out anytime during stages one and two, so I thought it best to separate it out into its own table, though again, it's a one-one relationship with PropertyID. A property can only be redeemed once and a redemption can only encompass one property.

As well as other tables with judge info, county info, defendant info, etc. and a joining table here and there as well. Am I wrong in my thinking about how to split these up?

The_Doc_Man · Feb 15, 2017

You could keep ALL of the stage data in one table with the Stage ID and Property ID as a compound prime key. It is absolutely OK to have a foreign key (FK) (which for this case is the Property ID) participate in the compound PK of the stage table. Also, your idea for stage 3 having the property ID as an FK and the case number as the PK is perhaps a bit off. THIS is a case where you could argue that you needed a CourtCase table with a CaseID as a PK, making the data in the Stage table an FK. This might NEVER have more than one court case per stage 3 property, but that is OK. On the other hand, if you ever have multiple court cases for the same property for some reason, your structure couldn't store it.

Another point for stage 3 is that all those extra date fields could be moved to the CourtCase table. (NOTE that the word CASE is a reserved word as part of the SELECT CASE statement in VBA, so the name CASE by itself is not a good choice for a table or field name.)

If you have blanks in the Property table, they persist darned near forever. In the stage table, you still might have some extra fields that could be blank in some situations. But per your description, you SAVE that space for properties that never make it all the way through, because they won't HAVE all of the stage records! Records you never write take up the least space of all.

Now, you might wonder... how do you reconcile different names for fields in the different stages? You COULD just keep track of them somehow. Or.... Pick a bland name for the actual data fields. Perhaps BeginDate, XActDate, PaperDate, PublishDate, etc. Then write a query for each stage - and maybe even JOIN it with the property table. For one possible example, ...

Code:

Stage2Qry...

SELECT PropertyID, StageID, {various property fields go here}, BeginDate AS RipenDate, PaperDate AS PaperworkSentDate, PublishDate AS ResponseDeadLineDate
FROM PropertyTable INNER JOIN StageTable ON PropertyTable.PropertyID = StageTable.PropertyID
WHERE StageTable.StageID = 2 ... {other WHERE components, if needed, go here} ;

That way, if you wanted to see stage 2 items with the field names you like, the query can present them to you using alias names AND OMIT IRRELEVANT FIELDS! Oh, the empty fields will still be there, but queries will let you look at only the stuff you wanted to see in the way you wanted to see it. (Which is actually the primary purpose of a query anyway.) And a query can act as a form's recordsource just as well as a table could, at least for this kind of query.

Using the idea of FOUR tables to hold stage data for four stages would require you to have four one-to-one relationships on PropertyID - but that is actually wrong because you have four tables representing the same concept (stages) related to one table (properties) - and that is the purpose of having one-to-many relationships. Further, just because something is in a one-to-many relationship does not mean it HAS to have many child records. It can have just one.

Using the key-dependency concept I used earlier, this would be a violation of normalization because the contents of the four different tables depend on something other than their PK (as your proposed it). That is, the contents depend on the property ID and the name of the table in which they appear. But the actual information that led to table selection ISN'T EVEN A FIELD IN THE TABLE. And therein lies the violation. You have data in any one of the stage tables that does not depend only on a key field of that table. And that is why you MERGE those tables.

Hope I'm not confusing you too much, but you are asking for how to analyze this and I'm gonna do my best to answer if I can.

Steve R. · Feb 16, 2017

I see that The Doc Man beat beat to the punch. I agree with him that all the stage tables can be collapsed into one table.

My additional thought concerns the disposition tables, "redeemed" and "sold". These two tables can be combined into one table where you can use a logical field to identify whether the property was "redeemed" or "sold".

Based on my understanding of what you are attempting to do, there is transaction completion ambiguity. To clearly identify whether the property is "active" or "disposed" you may want to consider using the "PK" for a faux stage 4 entry as the completion code.

Be aware that others may not agree with that approach. You do have a "date sold" field. Some would advocate that if that date is null that is proof that the property is active. If that date is filled in, that the property has been disposed. Your choice.

PS: On reflection, the disposition tables are unnecessary. The fields can be incorporated into your main table.

BlueJacket · Feb 16, 2017

The_Doc_Man said:
Hope I'm not confusing you too much, but you are asking for how to analyze this and I'm gonna do my best to answer if I can.

It's only confusing because this is forcing me to think about data in a new way and I'm constantly reevaluating how I have my data organized. I don't consider this a bad thing. I appreciate your response.

The_Doc_Man said:
NOTE that the word CASE is a reserved word as part of the SELECT CASE statement in VBA, so the name CASE by itself is not a good choice for a table or field name.

This is good to know.

However, this is where I have to apologize because this one is on me as I apparently forgot to mention this in my previous post: Sometimes it works out so that we can combine many different properties under one case. Case numbers only get assigned at Stage 3. This creates a one to many relationship for Stage 3 and only Stage 3, so I don't think putting all the stages into one table would work.

I've been pondering how I want to handle this and I'm thinking now I have to create a new table joining tblPropertyDetails and tblStage3 with the following properties:

tblCases
PropertyID (PK)
CaseNumber (PK)
CaseID (FK) I feel this is necessary since it might be possible, however unlikely, that we could get assigned the same case number from two different counties.

I guess I could take out the PropertyID FK from tblStage3 all together then...

The_Doc_Man said:
Using the key-dependency concept I used earlier, this would be a violation of normalization because the contents of the four different tables depend on something other than their PK (as your proposed it). That is, the contents depend on the property ID and the name of the table in which they appear. But the actual information that led to table selection ISN'T EVEN A FIELD IN THE TABLE. And therein lies the violation.

And again, herein is where I have difficulties (I swear I'm not dense. With concepts I'm just more of a visual and hands on learner.). So, based off this and what CJ_London was saying previously... Does this not meant that I should just include tblStage2, tblRedeemed, and tblSold into the main tblPropertyDetails table? Like you said, they all depend on PropertyID. It violates normalization since the PK in those tables are actually just referring to the PK in the main Property Details table, which is where their dependency originates.

For some reason though, I still want to separate them out. I still want a set up that's at least something like...

tblPropertyDetails
PropertyID (PK)
other fields referenced in OP, just lots of info about the property

tblPreStage3
PropertyID (PK)
RipenDate
PaperworkSentDate
ResponseDeadlineDate
Sold (Y/N)
Redeemed (Y/N)
SoldRedeemedDate
AmountReceived
RecordedDate

tblCases
PropertyID (PK)
CaseNumber (PK)
CaseID (FK)

tblStage3
CaseID (PK)
Attorny
JudgeID
QTFileDate
PubMotionFileDate
PubOrderFileDate
MotionDefaultFileDate
FinalOrderFileDate
FinalOrderRecordedDate

BlueJacket · Feb 16, 2017

Also, I just want to thank you all again. Your help is immensely appreciated.

The_Doc_Man · Feb 16, 2017

Does this not meant that I should just include tblStage2, tblRedeemed, and tblSold into the main tblPropertyDetails table?

Now I'm confused. Whatever happened to stage 1?

However, to answer the question another way: No, it doesn't mean that IF you still use related stages 2, 3, 4, ad infinitum (and even if you DON'T have a stage #1). If you have multiple stages sometimes but not all of the time, you are asking for treating some stages differently AT THE CONCEPT LEVEL as compared to other stages. From your earlier description, that is wrong.

I want to go back to an incredibly basic principle here. Reality should drive the database, not the other way around. The tail must never wag the dog, if you understand the metaphor. This database should end up like a MODEL or SIMULATION of your business data flow at least in some aspects. Part of the problem may be that - as you say - you are very visual and don't see this yet. Your confusion and inability to "see" the structure tells me you have to go back to the drawing board and ... well... DRAW.

I'm going to pull out one of my Old Programmer's Rules here. When it comes to designing an Access application, ... Rule #1: If you can't do it on paper, you can't do it in Access.

Rule #1 means this: Access knows NOTHING about your business except what you tell it via tables, queries, forms, reports, macros, and module-based code. You have to tell it how to handle everything - which means YOU as the designer must KNOW everything you might need to tell it. So you must draw out your design, build pictures, draw flowcharts, whatever method works for you. Hint: IF you try this and still don't see the structure, that method isn't working for you. Don't feel bad, though. We all go through that stage where there is that one last little facet of design that doesn't quite fit into place. TRUST me, we've been there, done that, and dirtied the T-shirt.

In any case, get to a dry-erase board with some cleaner, some dry-erase markers of different colors, and maybe a couple of pads of sticky notes on which to POST any notes that come to your mind. Draw boxes representing the tables and stages. Draw lines between them to discover their relationships and interactions. Things in the stage table that don't vary according to the stage should be somewhere else, perhaps in the property table, perhaps in some lookup table. (Again, the rule is that to be in the table, a field must depend entirely and only on that table's keys. If things in the table don't vary from stage to stage, they don't depend on the stage portion of the key.)

While I'm at it, I'll add Old Programmer's Rule #2 - Access won't tell you anything YOU didn't tell it first (or at least tell it HOW to tell you). Which means that if you are trying to design an app, sometimes you have to decide what you want to see in order to know what data you have to gather to support that output. I.e. work backwards from output to data source sometimes. To borrow a product from the old Mad Magazine, if you are going to want to see left-handed veeblefetzers, you will need to find a source for those left-handed veeblefetzers.

Good luck and don't give up. Methodical approaches are required at the design phase and can lead to immense frustration because of how slow such an approach can be. But it is how you get from point A to point B.

BlueJacket · Feb 16, 2017

The_Doc_Man said:
Now I'm confused. Whatever happened to stage 1?

Stage 1 is pretty much the Property Details table and happens before the information is even entered into the database. Researching all that information and buying the specific properties is Stage 1.

I do research into ~500 properties on a slow month and ~1800 properties on a busy month. Only a small percentage of those do we invest in. Right now the DB will serve to help with the case management side of things, and then will ultimately be expanded into the financial side of the business and possibly more even later down the line.

BlueJacket · Feb 16, 2017

Steve R. said:
PS: On reflection, the disposition tables are unnecessary. The fields can be incorporated into your main table.

Would this not violate the second normal form? If I have:

tblRedeemed
PropertyID (PK)
RedeemedDate
AmountReceived
RecordedDate

But then put these in the main table to make it look like:

PropertyDetails
PropertyID (PK)
All the property info fields
Redeemed (Y/N)
RedeemedDate
AmountReceived
Recorded Date

Wouldn't all the info that comes after the Redeemed field be dependent on the PropertyID and the Redeemed field? Thus breaking the second normal form?

The_Doc_Man · Feb 17, 2017

Wouldn't all the info that comes after the Redeemed field be dependent on the PropertyID and the Redeemed field? Thus breaking the second normal form?

This is where my rule #1 kicks in AGAINST us as being best advisors. Only YOU know how your business works. We don't, so often we take shots in the dark and find that we missed the broad side of the barn.

The answer to your initial question and to the one I just quoted is becoming incredibly complex. If the presence of the basic property info is stage 1 and your extended court case information is stage 3 and the other stages are simple waypoints along the path, I think part of our problem is that the stages are not truly commensurate with each other. They seem to not "fit together nicely" in some ways. Therefore it is very hard to decide WHAT is the correct structure.

You also said this:

each stage has its own paperwork, dates, and other benchmarks it needs to hit before proceeding.

If that DateAcquired field and any other items are inherent to stage 1 and not actually part of the property description, then that date is misleading in its implications. I don't know enough about your business to know which other fields in your property table are encompassed by your quote about "...each stage has its own {stuff}..." but if any other fields in the property table actually go with the "stuff" then they belong in the stage table.

Your statement makes it seem that the separate phases make a lot of sense to you and need to be formalized. It is the degree of formality (and possibly the need for historical tracking) that will determine whether you truly need a stage table or whether this could/should have been left as a spreadsheet - from a business sense. And of course if you are trying this for your own edification or as an experiment, then the information you gain from formalizing a description of what really happens will benefit the business anyway.

Steve R. · Feb 17, 2017

BlueJacket said:
Would this not violate the second normal form? If I have:

tblRedeemed
PropertyID (PK)
RedeemedDate
AmountReceived
RecordedDate

But then put these in the main table to make it look like:

PropertyDetails
PropertyID (PK)
All the property info fields
Redeemed (Y/N)
RedeemedDate
AmountReceived
Recorded Date

Wouldn't all the info that comes after the Redeemed field be dependent on the PropertyID and the Redeemed field? Thus breaking the second normal form?

The property is an object, property details attached to that object include the date of acquisition, the date of disposal, and the money involved. These events (it would seem) would normally happen only once, therefore they can be included in the main table.

However, in the event that there can be multiple property acquisition/disposals/monetary events for that one property, then a separate table would be justified for tracking these events. If multiple acquisition/disposal dates and monetary events need to be tracked, then a separate table linked by the PropertyID (PK) would be necessary.

Table Design Help

BlueJacket

Registered User.

CJ_London

Super Moderator

BlueJacket

Registered User.

CJ_London

Super Moderator

BlueJacket

Registered User.

CJ_London

Super Moderator

BlueJacket

Registered User.

The_Doc_Man

Immoderate Moderator

BlueJacket

Registered User.

The_Doc_Man

Immoderate Moderator

Steve R.

Retired

BlueJacket

Registered User.

BlueJacket

Registered User.

The_Doc_Man

Immoderate Moderator

BlueJacket

Registered User.

BlueJacket

Registered User.

The_Doc_Man

Immoderate Moderator

Steve R.

Retired

Similar threads

Users who are viewing this thread