Nulls: Should it ever be meaningful?

Banana · Jul 10, 2007

The_Doc_Man said:
I guess I didn't look as hard at your "questions" as I should have, but this still bothers me, just no longer at a fully conscious level. (But then, at this time of night, "fully conscious" might be asking a bit much.) Back to the issue - the nagging feeling I have probably means there is an unresolved issue in my mind that could merely be a prejudice or it could be a bigger issue. I just don't know.

I don't blame you at all. This IS an unusual structure and in all databases I've seen, none has this kind of structure. In that sense, I'm tempted to say the structure may be wrong. But on the other hand, just because you haven't seen it before, doesn't make it wrong. Unusual situation requires unusual solutions. This is why I'm asking so many questions here; I have my doubts that the solution I proposed is the best one. And I fully intend to try this solution; burn it in furnace until it comes out golden or discard it if it comes out as a lump of ash.

Perhaps the problem is that every role potentially has different "questions" to be answered - even if they represent attributes of the role. This makes role unification nigh unto impossible when these "questions" are part of the role tables. I am no expert on the higher levels of normalization, but I believe that by folding the attributes into the role tables and allowing the roles to be so different as to be incompatible, you have exposed a denormalization of a very subtle type.

Hmm. Interesting., especially the "denormalization of a very subtle type." The way I look at this, it was most logic to split out incompatible attributes into 1-1 tables, to avoid creating repeating columns had I created freestanding tables for each roles (e.g. names, personal data and other attributes that are true of all employees, whatever their roles may be.)

A tangent: Based on what I've seen, some time people treat two different kind of people differently. For example, an template for student registration stored entries about students' personal data but teachers were in their own table and no personal data was no asked about the teacher. That's perfectly fine, if you decide that it's HR's job to keep track of teachers' personal data, and the database exists to keep records about what classes students may register for. Possibly because of the false dichotomy between students' data and teachers' data, no one thought about what if we needed personal data that are common to both students and teachers but also needed incompatible attributes about students and teachers.

Now, one thing I should make clear; if we were talking about one or two extra columns for two incompatible roles, then big whoops. Just leave it in the one table; I've wasted a few bytes in exchange of much more simpler table maintenance. Things starts to get hairy ONLY if we're talking about more than two roles, adding column that's more than just Yes/No (1 byte) *AND* would be empty (e.g. not applicable) to large percentage of the employee. With two roles (let's assume that all employees must have a role, and roles are mutually exclusive), you have 50% chance that a column will be empty. Not too bad. With 3 roles, it's now 33%. 4, 25%, and I'm not even factoring the numbers of column associated with each roles. Here, it becomes a performance and possibly data integrity issues to maintain a one big table with lot of empty fields.

With 1-1 tables, I now can assure that all relevant tables are relatively "full", the child tables just has the records where those extra bytes will be put to good use.

It is trivial to unify roles when they are part of a list associated with a role. You just about can't do any unification if not. And I'm in favor of coalescing those three parallel tables to one table (of unified roles) and one child table (of role attributes that you call "questions.") Am I making sense that way?

Last night, I thought very carefully about the possibility of creating a list of questions about roles' attributes and storing them as record rather than fields with the junction tables you proposed earlier. I *think* it could be doable; I'll need to run a small prototype database to test the idea. But it hit me- there is a data integrity problem. Suppose I wanted the questions to be required, something I can do with fields. The advantage of having 1-1 tables is that I could make all fields in each tables required, but do not have to fill in the fields in child record until I associate an employee record to one of child record. Thus I can maintain data integrity; how would I do this if the fields were now a set of record in a "Question table" with answers in a junction table? I'd imagine I'd have to take extra steps to look up whether a employee record is missing answers to certain roles and force creation of record. However, that feels wrong to me because as I said earlier, nobody should be making records "just in case" or deleting records just to get everything else working correctly.

Or maybe you know something that I don't, and can explain how we can maintain data integrity between a pseudo-field record and a answer record?

You were forced to split your employees table into parts because you could not unify the roles. But the question I have is whether increasing the virtual depth of the tables (to have lists of attributes underneath) would allow you to narrow the three-table layer to a single table, thus unifying everything and yet retaining generality.

This, I have to agree; the three-way split can be quite a challenge to maintain, and perhaps even unnecessary. I say so because I know of no other solution. However, my ignorance doesn't mean that no other better solution exist for this problem.

We are talking past each other. You make NO RECORD AT ALL if there is no attendance. Then you don't JOIN anything. You just count records matching the person and class from the attendance table. This is a JUNCTION table, perhaps - but you don't have to exercise a JOIN to get there. Just a DCount. If the DCount of the person/class records is zero, you know what your attendance was like. End of question. Not a null - because DCount is one of the rare Domain Aggregates that does empty sets correctly.

Okay, so we're agreed on that point then. Therefore, having nulls for where there were no attendances is not poor design in itself. It is more about how you join tables (not necessary for this scenario). However, you say that DCount is an exception as it handle empty sets correctly; if we had another scenario using say, DSum or DAverage, does the possibility of having nulls (because the intersection wasn't needed, just like if teacher took roll call and didn't enter a record for an absent) in the junction table put those out of whack?

The_Doc_Man · Jul 10, 2007

Question regarding DSum and DAverage: Not sure, but Access Help for these two functions should tell you how NULL is treated.

Regarding comments on 1-1 tables. Yes, you can assure that the second table is full, but 1-1 tables have a very specific usage and I question whether this is that case. Bear with me because this will be a tough "slog through the mush."

I know of two cases where one/one relationships make sense.

Suppose you have table A but it gets sooo friggin' big that it blows out the buffer size limit in Access. This could happen if you had a lot of long text fields that were not MEMO fields. So you have to split the data apart, right? BUT you still can't use the split tables in a JOIN query because THAT record would blow out the buffer limit, too.

So making two tables Abase and A1 won't help. But if there is a sensible way to split the table into THREE parts and only work with two at a time, you can make that work. Abase JOIN A1 would contain some data, Abase JOIN A2 would contain some different data from the other JOIN. If there is a logical and reasonable way to do this, then you might make it work as TWO one/one relationships with a common base table. With me so far? This case exists in response to a SPLIT TABLE due to sizing.

There is ONE and ONLY ONE other exception I have ever seen for this rule - when there is a public part and a private part to the table, a two-table split with one/one between them is allowed when you allow the public to see the left-hand table but only managers or administrators can see both sides. (And you are enforcing the security with different permissions on the two sides of the relationship.)

In other words, there is a non-data-field attribute difference between the two sides, in this case being a data accessibility difference. I don't see your case as fitting this exception either.

But considering one/one where neither of the above is an issue, let's look at what it really means... Let's agree on a temporary convention. Normally, relationships have SIDES. Left side and right side. So call the tables LEFT and RIGHT. (We talk about LEFT JOIN and RIGHT JOIN, right?)

First factor: The one/one relationship means that BOTH SIDES need a key with no duplicates, i.e. unique. (The ONE side of a relationship ALWAYS needs a unique key. Here, you have two ONE sides.)

Second factor: Even if the relationship doesn't use the declared primary key of the LEFT table, the key it DOES use must have been a possible candidate key for being prime in LEFT.

Now here is where things get crazy... Since the primary key uniquely selects the record in the LEFT table, and since we are assuming the tables are otherwise fully normalized, that means that you have a redundant key somewhere.

Either A - the linkage between the two ONE/ONE tables is based on the LEFT table's PK, or B - the index that is actually used is not otherwise needed and you can drop it. Then use the real PK from the LEFT table as the linking key in the RIGHT table.

In case B, the redundant key is the field in the LEFT table that isn't needed for the linkage to work. Just make RIGHT link through LEFT's PK and you are done.

Hold onto your hat, here.... the redundant key for case A is the PK of the RIGHT table. You see, if both tables use the same key, they reference attributes of each other and that means you have split the table inappropriately. All data in the RIGHT, because it depends only and completely on the PK of that table, WHICH IS THE SAME KEY AS THE LEFT TABLE, must also depend on the PK of the LEFT, in which case ALL of the data in the RIGHT table belongs in the LEFT table and there should be NO separate RIGHT table.

If the relationship was really one/many, the above analysis is not true. If the relationship was really one/{sometimes one, sometimes NONE}, this should be written as a one/many and treated as such.

Remember, if you DECLARE a one/one relationship and one of the members is sparse, you lose any record that has no partner unless you declare an OUTER JOIN, and that isn't the default case for Access. Therefore, to some authorities, one/one relationships should only exist for the highly limited reasons I listed earlier.

That leaves me to consider that something else must be wrong with a structure that requires you to create three sparse tables that, when combined, have the effect of building a net one/one relationship. Stated another way, the UNION of the three tables would be one-one with your referencing records even though the individual tables would not.

To me, the structure that makes sense here is

tblPerson
PersonID - PK
Info about person

tblRole
RoleID - PK
descriptive information about role

tblPersonRole
PersonID - FK to person
RoleID - FK to role held by that person
(This might be all you need here unless you are tracking when this person was given this role, or when this person was relieved of this role.)

tblProperties
PropertyID - PK
descriptive information about property - contains the "question" that you need to answer.

tblRoleProperties - lists attributes/properties/"questions" applicable to this role
RoleID - FK to role
PropertyID - FK to property
This lists the applicable questions to each role. THIS defines the properties of the role. The tblProperties table only lists properties used for at least one role, but no guarantee about how often it is actually used. (But then there is always DCount...)

tblPersonRoleAnswers
PersonID - FK to person
RoleID - FK to role
PropertyID - FK to specific property
Answer - specific to the question for this person as applicable to this role.

The question might come about: What if I have a question that applies to every person? Answer: Create the GENERIC role to which everyone belongs. Assign common questions to the generic role.

What if I have a question that applies only to the role when it depends on a specific employer and not when other employers are involved. Answer: You have two roles - one for for employer X and one for everyone else, with extra questions for people employed by X but no difference in the descriptive section. I.e. Interrogative Logic would have grounds to consider the roles as different because the properties are different even though their descriptions are the same.

In this structure, there is NO NULL EVER. If you ask every question applicable to a given role, you have answers to everything. And list enumeration is easy enough, even if it takes a somewhat tricky query or some VBA code to manage the process.

In this structure, you make no assumptions about nulls. If you never answered the question, the list just doesn't contain the reference. This also lets you discover (by DCount to see if you have an answer) what you DIDN'T ask yet so that you can remedy your oversight.

The thing about this is that it is not bad to build a report with the properties as the lowest-level detail segment, and with role headers (and footers). Then, encapsulating THAT, you can have person headers and footers listing the roles held by each person as an "inner" list. A three-tiered report that tells you the properties of each person in each role.

Banana · Jul 10, 2007

The_Doc_Man said:
There is ONE and ONLY ONE other exception I have ever seen for this rule - when there is a public part and a private part to the table, a two-table split with one/one between them is allowed when you allow the public to see the left-hand table but only managers or administrators can see both sides. (And you are enforcing the security with different permissions on the two sides of the relationship.)

In other words, there is a non-data-field attribute difference between the two sides, in this case being a data accessibility difference. I don't see your case as fitting this exception either.

As a matter of fact, that's exactly where I got the idea of three way split from; I reasoned back then that if they could split table that otherwise should be single table for security problem and not suffer from normalization problem, then surely I could do the same with roles and its incompatible attributes.

Hold onto your hat, here.... the redundant key for case A is the PK of the RIGHT table. You see, if both tables use the same key, they reference attributes of each other and that means you have split the table inappropriately. All data in the RIGHT, because it depends only and completely on the PK of that table, WHICH IS THE SAME KEY AS THE LEFT TABLE, must also depend on the PK of the LEFT, in which case ALL of the data in the RIGHT table belongs in the LEFT table and there should be NO separate RIGHT table.

Let's see if I'm understanding this: the table that was split for data accessibility reasons technically violates normalization because all of attributes actually belongs in one table. Therefore, the three way split would likewise violate normalization, but for different reason- the attributes does not belong in that table, even if they may be asked once per role, correct?

Remember, if you DECLARE a one/one relationship and one of the members is sparse, you lose any record that has no partner unless you declare an OUTER JOIN, and that isn't the default case for Access. Therefore, to some authorities, one/one relationships should only exist for the highly limited reasons I listed earlier.

Going off from your earlier post, and trying to see if I am following you so far- I would want to avoid using outer join as a fix-it for nulls I will get in when pulling together records from tables when I could just structure it differently and get all records using inner joins... correct?

That leaves me to consider that something else must be wrong with a structure that requires you to create three sparse tables that, when combined, have the effect of building a net one/one relationship. Stated another way, the UNION of the three tables would be one-one with your referencing records even though the individual tables would not.

I'm sorry, I'm lost on this one. If I did a union query, of course I'd get all records from each tables with redundant primary keys...

To me, the structure that makes sense here is

...snip....

In this structure, there is NO NULL EVER. If you ask every question applicable to a given role, you have answers to everything. And list enumeration is easy enough, even if it takes a somewhat tricky query or some VBA code to manage the process.

I'll certainly give this a try. However, I should remind you what I posted a bit earlier:

Banana said:
But it hit me- there is a data integrity problem. Suppose I wanted the questions to be required, something I can do with fields. The advantage of having 1-1 tables is that I could make all fields in each tables required, but do not have to fill in the fields in child record until I associate an employee record to one of child record. Thus I can maintain data integrity; how would I do this if the fields were now a set of record in a "Question table" with answers in a junction table? I'd imagine I'd have to take extra steps to look up whether a employee record is missing answers to certain roles and force creation of record. However, that feels wrong to me because as I said earlier, nobody should be making records "just in case" or deleting records just to get everything else working correctly.

One of my lessons I learned a way back was: If the database engine can do it, LET it do it. Don't re-invent the wheel by using VBA or whatever to protect data integrity when the database engine is perfectly capable of doing so. Since then, I've come to thought of VBA as strictly for enforcing business rules. Making "questions" a set of records rather than set of fields would seem to me going backward on that lesson... Maybe you can elaborate a bit more?

And, as always, that was a first-rate post! My hat off to you.

Banana · Jul 11, 2007

A bit more consideration:

If we converts "questions" from a field to a record, there's another problem about not using database engine I've alluded in my recent post.

If they are now a record, this means I no longer can bind controls on form to those "questions." I would then have to use unbound controls, and this means VBA. While not bad in itself, this, to me at least, seems to be an invitation to overcomplicating things and introducing more opportunities for things to go wrong when the three way split tables is something that database engine can parse and handle.

There's another thing; suppose the list of questions change? While a table of questions would be practically trivial to update, there's still a lot of work entailed in un-binding the related controls on forms and updating everything THEN debugging everything for something that could have been done by the database engine.

Why re-invent wheels?

That mentioned, would it be an alternative to use a one-many relationship for the child tables instead of one-one relationship? After all, just because it's a one-many relationship, it doesn't mean it *has* to have more than one relationship for a given record, right?

The_Doc_Man · Jul 11, 2007

I would then have to use unbound controls

Or maybe a sub-form?

There's another thing; suppose the list of questions change?

What's easier to change - a table where the fields are hard-coded to represent the answers to questions that just changed, or a separate table that lists your questions and answers - and for which relational integrity can be used if/when you have cleanup issues?

That mentioned, would it be an alternative to use a one-many relationship for the child tables instead of one-one relationship? After all, just because it's a one-many relationship, it doesn't mean it *has* to have more than one relationship for a given record, right?

If I didn't state it clearly enough before, then let me try again. Starting from the assumption of a one/one table, if there is ANY CHANCE IN HELL that more - or less - than one record on one side would correspond to a single record on the other side, you ALWAYS make it one/many.

I reasoned back then that if they could split table that otherwise should be single table for security problem and not suffer from normalization problem, then surely I could do the same with roles and its incompatible attributes.

The problem I have with this statement is that it is possible to normalize this layout using parent/child techniques to define your attributes and rejoin the table. Now, if there IS a security reason for keeping the data separate, then you are perhaps right to keep them split and I SURELY will not argue against that point. It is the correct way to go. But... if you were merely looking at this by analogy, I still think this is the wrong conclusion.

A data difference vs. a meta-data difference is all the difference in the world. But what you are describing sounds merely like a data difference that for some reason you are reluctant to treat as such.

I'll give you a little hint. I'm getting this design from the ORACLE Role tables which have child tables called Role Permission tables and Role Privilege tables. (Not using the exact name 'cause my books not where I am at the moment.) Roles allow access rights to various data structures but also include ORACLE privileges - like CONNECT and ADMINISTER and several other things. When you hold the role, you actually hold a hierarchy of elements that form a "tree" of attributes underneath the single ORACLE role.

The thing is, ORACLE doesn't split roles according to how powerful they are. ALL ROLES ARE STRUCTURALLY THE SAME in ORACLE - whether you have read-only access to one table only, or you are the DBA. It is what is beneath the top-level role record - the permissions tables and privilege tables - that make the difference. It doesn't matter whether your access to ORACLE is through SQLnet using a login name IDENTIFIED BY a password or whether you log in to the operating system and are IDENTIFIED EXTERNALLY. That is conceptually the same as working for one company or another. And it works very well in an extremely popular and successful product. (OK, so Larry Ellison's ORACLEproduct is almost as big as his ego, but hey, it works. Maybe he has grounds to gloat just a little... even if he IS a Bill Gates wannabe.)

Therefore, I know that what I am describing is a possible structure based on what I see you trying to do. I guess I am wondering if it is my explanation that is inadequate or whether I lost you in the weeds somewhere. (If I have, send up a flare and we'll BOTH look for a way out.)

Tell you want, draw out this structure using pad and paper and just LOOK at it. See if it makes sense to you. Sometimes seeing it graphically works better than just seeing it in print.

Banana · Jul 11, 2007

Now when you mention Oracle's structure, that is a big reassurance to me.

Believe me, I have had bad feelings about the three way split I proposed earlier, because of two reasons; nobody else was doing it (well, just because nobody does it doesn't make it wrong but at least one should be very suspicious) and difficulties in making queries work with such structure.

However, I'm still confused about one last point: When we make the roles' properties a record, this is no longer exposed to the database engine and therefore I cannot check for required property, enforce relationship integrity or specify what data type is required for a given property. Sure, those can be done via VBA or something like that but as I said earlier, why re-invent the wheel?

My point about when the list of questions about roles' property changes was that (at least from what I'm seeing), I'm trading one devil for another; with a table of questions, adding new records or updating is totally trivial, but I've got to update the form and re-code everything in background to get Access to treat the new question-record as virtual field. OTOH, the three way split tables means I can simply bind controls to whatever fields I need but if it changes, I've got to add fields (and we all know that there's got something seriously wrong if you're adding fields, not records)

Unless, I'm totally ignorant about how I can get Access to treat certain records as a field or something similar?

The_Doc_Man · Jul 11, 2007

If I did a union query, of course I'd get all records from each tables with redundant primary keys

And the point I've been trying to make is that you shouldn't have to go through the headache of a UNION query in the first place because a role is a role is a role. But if you cannot put them all in the same table, then a role is NOT a role is NOT a role? It speaks to the question of whether the differences between the three tables in your split design REALLY describe different things conceptually.

My contention is that they do not. The difference is artificial because of folding different attributes into the three split table parts. Whereas if the differences were all enumerated in a separate (fairly narrow) list of attributes, you can have as many as you want/need/can stomach. What do you do in your design when a FOURTH contractor comes along with different rules? Then, you get like our site with prime contractors and an astonishing 26 different sub-contractors working not less than 20 major and about 50 minor projects with both development and maintenance funding, plus the site's overhead projects such as Building Maintenance, Telephony, Site Security, Purchasing, Shipping, Receiving, etc... and a myriad of roles with departments having such wildly different functions that it would be a small miracle if they all even SPELLED "role" alike - much less had similar attributes.

This is the thought that percolated in my head earlier. Your problem is hiding behind a three-legged table. What do you do when more legs have to be added? What do you do when a leg has to be removed? What do you do when one of the legs must change rules because of a corporate buyout that mucked up the works? If everything that makes a role UNIQUE in your three-legged structure gets dropped into child tables that completely describe everything, you have ONE role table and can add or delete roles - and their descriptions - fluidly at will. I was right all along. This IS a denormalization issue. Damned if I know which normal form it violates. Any takers among the onlookers? Banana and I are having a really good discussion here, but we don't want anyone to avoid us if you think you have something to contribute.

Banana · Jul 11, 2007

Banana and I are having a really good discussion here, but we don't want anyone to avoid us if you think you have something to contribute.

Yeah, come on out wherever you are; I promise we won't bite.

Well, The_Doc_Man may look harsh, but trust me; his bark is worse than his bite. So come on aboard. I'd really love to hear more from others.

The_Doc_Man said:
This is the thought that percolated in my head earlier. Your problem is hiding behind a three-legged table. What do you do when more legs have to be added? What do you do when a leg has to be removed? What do you do when one of the legs must change rules because of a corporate buyout that mucked up the works? If everything that makes a role UNIQUE in your three-legged structure gets dropped into child tables that completely describe everything, you have ONE role table and can add or delete roles - and their descriptions - fluidly at will. I was right all along. This IS a denormalization issue.

Touché!

Now that makes perfect sense to me.

The only thing left to do is to figure how to get those "questions" record to behave like fields. Did a quick google search, but apparently nobody has done this before (or maybe I'm just not putting in right keywords). One possible idea I had was to create a temptable that represents a denormalized table with the questions-record from the question tables with answers as the recordsource. This way, I can get to expose the questions to database engine for data integrity, validation, etc. etc. and when the user is done with describing an employee, the temptable gets discarded and everything is stored in its normalized form. If I can abstract the temptable sufficiently, then I'd never need to write VBA to do all works what database engine should be doing.

Hmmm hmmmm....

Maybe anybody else has a better idea here?

The_Doc_Man · Jul 11, 2007

Went to Wikipedia for some reading. I think the three-table split violates 4th normal form. Hey, it's an abstracted normalization, I could be wrong. Go to the Wikipedia article on "Database Normalization" and take a really hard look at the 4NF sub-topic. Then you tell me whether you see it. And if you don't, I'm not going to be upset. When you get past 3NF, sometimes things get REALLY obscure on a good day and we won't talk about bad days.

The_Doc_Man · Jul 11, 2007

this is no longer exposed to the database engine and therefore I cannot check for required property, enforce relationship integrity or specify what data type is required for a given property

I have NO clue as to why you think this. ALL tables related to other tables via parent-child techniques are ALWAYS exposed to the DB engine. If the properties are listed in a child tables, OF COURSE you can enforce relational integrity. The only wild-card in the bunch is whether you have different data types for different "questions" - and even there, you can use various codes to define what your text box expects to see.

A LOT of this can be query/form stuff, even though the joins look a little bit intimidating. No, I'll correct that. They look uglier than Cinderella's Ugly Stepmother BEFORE she takes the curlers out of her snakes.... er, hair.... in the morning. On the other hand, if you are worried about getting all of the answers, you will probably need at least a little bit of VBA to assure that all of the questions are answered before you allow the screen to be closed. But this is just crying for a datasheet-view child table as a subform.

Banana · Jul 11, 2007

Intermission: Because this is now a new page, I'll repeat The_Doc_Man's request: If anybody is watching this discussion unfolding, please feel free and join in. Two great minds are great, but three are better, and four is even better! So come down and speak your piece of mind here!

The_Doc_Man said:
Went to Wikipedia for some reading. I think the three-table split violates 4th normal form. Hey, it's an abstracted normalization, I could be wrong. Go to the Wikipedia article on "Database Normalization" and take a really hard look at the 4NF sub-topic. Then you tell me whether you see it. And if you don't, I'm not going to be upset. When you get past 3NF, sometimes things get REALLY obscure on a good day and we won't talk about bad days.

To be frank, I'm still astonished that I totally missed the denormalization of three legged tables, especially when I thought I had normalization all wrapped and neatly packaged! Heck, I've visited the Wiki about normalization (and other database concepts in months past), and the article about Null was what prompted me to start this very thread!

But if I recall you saying before, experience is recognizing the mistakes next time you make it.

Anyway, I feel that functional dependency was the problem child here. The 4NF article doesn't seems to apply to the three legged table at the first glance, because it seems to be talking about repeating rows, and the three legged tables obfuscate the fact that if I were to re-join it into one great table, I'd probably see just one row of each employee with several empty fields. However the functional dependency article makes it clearer that the attributes are actually dependent on roles and therefore do not belong in the employee table (whether split or not), and is better represented in the structure you proposed earlier.

I have NO clue as to why you think this. ALL tables related to other tables via parent-child techniques are ALWAYS exposed to the DB engine. If the properties are listed in a child tables, OF COURSE you can enforce relational integrity. The only wild-card in the bunch is whether you have different data types for different "questions" - and even there, you can use various codes to define what your text box expects to see.

Really? Are you saying that records can have validation rules, required, indexing, and other properties that fields have? Furthermore, I thought that when you have a junction table, you can't simply expect Access (or any other database engine) to scoot over and look for where there are no answers given to questions?

Banana · Jul 11, 2007

There's a loose end that needs to be wrapped, and this is strictly theoretical here:

Banana said:
Maybe this is because I'm not convinced that ternary logic exists on its own. If you consider this truth table:

(Credits to Wiki article on Ternary Logic)

Now, we have "TRUE OR NULL = TRUE", which the article reasons that even if we don't know what NULL represents, we can see that even if it was later known to be false, "TRUE OR FALSE = TRUE", and still returns the same answer if the NULL was later shown to be true.

Now, something is wrong here. We've been using binary logic all along! Ternary Logic is nothing more but two-level binary level compressed in one step. Here's a tree to exemplifies how binary logic can still handle Nulls:

Code:

Do we have knowledge? / \ True False / \ Is it true? NULL / \ True False

A tangent: This is a reason why we're still using binary as machine code- manufacturers, at least for time being has decided it was cost-effective to double the computing power and amount of data storage every 18 months than to add more states for same amount of computing power or data storage. Everything that can be done in multi-state can actually be simplified down to binary calculation; it may take a bit more time or processing but it's doable. Furthermore, it's actually simpler to do calculations in binary than in ternary, and let's not think about doing calculations with 11 states!

So, there's a chance that people who are calling for no NULLS in SQL are right; and (I hope I'm not putting word in your mouth) like you said, if we wanted null to mean something, it is ultimately our job to give it a meaning by creating the appropriate tree... Correct?

Mind you, my last sentence was awkwardly worded and what I was talking was if we wanted other states such as "Missing but Inapplicable", "Not reported", whatever identified, it's our job to create such states and leave Null well enough.

Now, if we can resolve ternary logic back to binary logic, and considering that a database is supposed to use closed world assumption (e.g. if we don't know something, it's false by default), does that it means we would be better off if SQL language didn't use Nulls and defaults to false whenever no data is returned, and leave the job of identifying the cause of missing data up to developer to enumerate?

The_Doc_Man · Jul 11, 2007

does that it means we would be better off if SQL language didn't use Nulls and defaults to false whenever no data is returned, and leave the job of identifying the cause of missing data up to developer to enumerate?

I hope I didn't give that impression. NULL has a place in SQL. I don't think we should default everything to FALSE or 0 or empty. NONE of those accurately represent the meaning of NULL.

I'm going to take a HUGE leap here. The following rhetorical question and answer represent an extreme example of what we SHOULD consider when facing the questions about NULL; however, the scope of this question is decidedly different. I'm taking it elsewhere to break out of a confined thought pattern. (Happens, you know ... keep focusing on the trees, forgetting there is a forest around us.)

Question: What preceded the creation of the universe?
Answer: NULL

When you cannot retrieve information about your topic, it is not false. It is not true. It is just NOT. We have trouble with NULL (as do many mathematicians, logicians, and religious philosophers) because we cannot clearly imagine NONEXISTENCE. We incorrectly cast properties upon it. Like saying it is FALSE or 0 or empty. We are dreamers who imagine things that we believe COULD or SHOULD exist. When we come across something that by definition cannot exist, it trips us up.

The truth should ALWAYS be that when we have no data, we cannot take another step forward. We MUST take a step sideways. Or backwards. The idea of leaving it to the DB developer to evaluate and provide for a way to know WHY something is NULL, however, is dead on the mark. Nobody else CAN do that for you. If that information is of importance, then you MUST plan for it and leave a way to record it.

The concept of TERNARY logic isn't inherently at fault. It is all us. We humans, being an opportunistic bunch at heart, immediately see something extra and want to USE it for something. I could probably develop a good argument that this is an example of human greed - taking with you anything that wasn't nailed down - and bring along a pry-bar just in case any of the nails were loose.

When that "something extra" is NULL, those who are very careless will instantly try to apply a meaning to NOTHINGNESS, which stated that way SHOULD be prima facie self-contradictory. Remember the discussion about a language being either CORRECT or COMPLETE but never both? "Self-contradictory" is, according to Goedel, sufficient to prove non-correctness in a language. And therein lies part of the problem. Making a statement involving NULL is VERY dangerous because of the greater probability of making a self-contradictory or otherwise nonsensical statement about nothingness.

The_Doc_Man · Jul 11, 2007

Really? Are you saying that records can have validation rules, required, indexing, and other properties that fields have?

If you enter records through forms and never directly, you can impose all the rules you want. I gather from your discussion that you want to avoid VBA, but I respectfully suggest that you overrate its difficulty. If you want to impose rules, it is fairly easy to put something in the form's Before_Update event to determine whether you have answers to all questions/properties. A Dcount of possible questions and a Dcount of actual answers should match or you simply Cancel the update event and disallow record navigation, exit, and other issues on the form. It isn't that hard.

As to indexing... if you put something in a table, you have ALREADY made it possible to enforce relational integrity to assure that proper parent records exist. So if you added a property, Access could certainly verify that the role you wanted to add it to existed BEFORE you completed the process. If you wanted to fill in various properties via a list and you were concerned about having no answers in some cases, there are such things as having "sanity-check" queries. By the way, this answer ALSO applies to your attendance question that flitted about earlier in the thread.

If you have a list of X and want to know if there is a list of Y that SHOULD have junction table entries to match X but some do not, you can write a query that enumerates all Y for which the DCount of the appropriate junction table between X and Y is zero for a given X,Y combination.

In your attendance question, you are asking too much perhaps. I'll explain. If you want to know who didn't attend on a given day, you don't mark them as absent. If they attended, you mark them as present. But here, you can tell if someone skipped a day because, for each event X-prime (consisting of event-days X), you can count the attendence of each Y to X or X-prime and see in a table all Y members whose attendance count doesn't match up. Why? Because adding an ABSENT marker actually OVERDETERMINES the table.

You know the range of possible X (sessions?). You record the actual Y (attendance). You can make a list of attendees. But because X is limited and Y is limited, it is possible to form the logical COMPLEMENT of the set, the set of all X for which a particular Y is missing. You don't make a record of absence, only of attendance.

Now if you're being attentive, it would SOUND like I just cast a meaning on NULL - the absence of an attendance record.

Who knows? Maybe I did. BUT this case is special in a way. The SET of POSSIBLE ATTENDANCE is a binary set - attended/did not attend. Therefore, forming the complement of the set is possible. (For multi-valued sets, this ISN'T possible or at least isn't easy.) So the logical complement of the Attendance set is the Absence set. And that is what you wanted.

If the set X was unlimited or if Y was unlimited, you couldn't do this. But neither is unlimited. Therefore, formation of the complement set is possible AND MEANINGFUL. But it is not a property of NULL that has meaning. It is the property of the overarching SET that has the ability to be complemented.

It's a fine point and not everyone will see it. But where this is going is that earlier you talked about looking for either ABSENCE records or NULLS where you thought you should see ATTENDANCE records. My thought, which I did not express clearly, was that since the set is binary, having both an absence record and a NULL search was overkill. Make NO absence records. Just look for the NULL return of an expected session attendance record since the absence of attendance (records) is ... (wait for it) ... absence.

Banana · Jul 11, 2007

The_Doc_Man said:
I gather from your discussion that you want to avoid VBA, but I respectfully suggest that you overrate its difficulty.

You know, last year when I designed the database, I had *lot* of VBA here and there, to loop through recordsets, do data integrity checks, etc. etc. Quite a lot of work (but learning experience!)

Fast forward to a month ago, I'm called to attention to the fact that action queries not only can do what VBA/DAO can do and even faster. Furthermore, no coding (short of occasional SQL tweaking) is required to execute an action query. Now knowing this, I foolishly realized that all of my work in writing VBA could have been easily replaced with action queries, saving me more work *and* headache in checking for errors. Therefore, the lesson I learned from this was: "Don't re-invent the wheel!"

So, that's why I was apprehensive when it was suggested that what I've been treating as attributes be treated as records rather than fields because on the first glance, it feel like I'm sacrificing the ability to treat it as fields and get engine to handle it accordingly for some more manual coding.

Don't think that I'm still holding to that three legged table idea- I've now clearly seen the flaws in the design and fully intend to modify it so it now has a one-many and junctions to represents all possible (and required) answers for any given role/person junction.

The SET of POSSIBLE ATTENDANCE is a binary set - attended/did not attend. Therefore, forming the complement of the set is possible. (For multi-valued sets, this ISN'T possible or at least isn't easy.) So the logical complement of the Attendance set is the Absence set. And that is what you wanted.

...snip...

Just look for the NULL return of an expected session attendance record since the absence of attendance (records) is ... (wait for it) ... absence.

Ahhh, that explains a lot. Even though I wasn't actually creating a record of absences, it was easy to do so, as you said it's binary, and when I considered that set (does complementing set mean a set that is opposite of the initial set?), I saw NULLs and thought to myself, "Oh, no. Am I abusing NULLs here?" But now I see that just because there's a chance that there may be absence of records for all possible intersection between attendees and sessions, doesn't mean NULLs has been assigned a meaning, since I could correctly determine attendances from records with a checkbox.

Banana · Jul 11, 2007

The_Doc_Man said:
I hope I didn't give that impression. NULL has a place in SQL. I don't think we should default everything to FALSE or 0 or empty. NONE of those accurately represent the meaning of NULL.

No, null definitely cannot be represented as False, empty, 0 or whatever.

When you cannot retrieve information about your topic, it is not false. It is not true. It is just NOT.

But what about the closed world assumption? The real question is not whether null can be true or false, but rather whether we should be treating null as false by default (even if it could actually be wrong).

The truth should ALWAYS be that when we have no data, we cannot take another step forward. We MUST take a step sideways. Or backwards.

Agreed. The idea proposed is that instead of letting SQL return null, let the developer worry about how to handle variable states, including Null.

The idea of leaving it to the DB developer to evaluate and provide for a way to know WHY something is NULL, however, is dead on the mark. Nobody else CAN do that for you. If that information is of importance, then you MUST plan for it and leave a way to record it.

This is kind of where I'm scratching my head; you're telling me that there's a place for null in SQL, then telling me that it's our job to identify the state if it's possible that null may be assigned a meaning that it shouldn't have had.

The idea I'm toying with is something like this: Modify SQL so that it no longer natively supports null. Whenever there is no data, it should return false (e.g. negation as failure). Then modify the SQL so it's extensible in the sense that we can evaluate the presence of data, and decide how to assign meaning to it by creating our own tree. After all, not all databases will have same "states" of absence or may not need them all. So why not leave it up to the developer to decide how they want to handle lack of data?

When that "something extra" is NULL, those who are very careless will instantly try to apply a meaning to NOTHINGNESS, which stated that way SHOULD be prima facie self-contradictory. Remember the discussion about a language being either CORRECT or COMPLETE but never both? "Self-contradictory" is, according to Goedel, sufficient to prove non-correctness in a language. And therein lies part of the problem. Making a statement involving NULL is VERY dangerous because of the greater probability of making a self-contradictory or otherwise nonsensical statement about nothingness.

Right; and if SQL is complete, then are we making it more troublesome than it should be? Our languages are complete, because as you said, we're dreamers and our imagination are unbound; we can think about circular square, nothing, liar's paradox, and just plain anything that simply cannot be and we wanted to express those ideas; thus it was natural for our languages to be complete. But SQL isn't a natural language; it's a language for us to work with computer. Does it then follow that it would be better if it were correct instead of complete as it would make the processing of data simpler?

The_Doc_Man · Jul 11, 2007

does complementing set mean a set that is opposite of the initial set

The original set plus its complement equals the entire possible population if the set is binary.

For coin flips, heads + tails = number of flips (if you catch the coin rather than allow 'edge' events).

But what about the closed world assumption? The real question is not whether null can be true or false,

When NULL is present, you do not HAVE the closed world assumption. When you have the closed world assumption, you do not have NULL. So if you are in the closed world assumption, the question is moot.

This is kind of where I'm scratching my head; you're telling me that there's a place for null in SQL, then telling me that it's our job to identify the state if it's possible that null may be assigned a meaning that it shouldn't have had.

If you got that impression then I've been sloppy. The place for NULL is clear. When a set is empty the return from that set is NULL. If you have to assign a meaning to NULL, it isn't NULL any more.

Modify SQL so that it no longer natively supports null. Whenever there is no data, it should return false (e.g. negation as failure).

The problem I see with that is that we REALLY should return a status code that says "you got back the NULL set" - but we don't do that, with the exception of VBA DoCmd.ExecuteSQL or whatever. At the screen interface level, the only way Access can tell us something is NULL is to "display" the empty set. Programming languages can have status returns distinct from their data returns. (Because status returns are META-DATA.) Access from a GUI doesn't have that ability. SQL from a VBA function COULD do that. So here I think my objection to revoking NULL is a representational issue due to loss of that meta-data. A pragmatist's approach, for sure.

Does it then follow that it would be better if it were correct instead of complete

I don't think so, since the problem is not whether data is FALSE, Empty, or Zero. It is that the MetaData is trying to tell us that something is wrong with the assumption inherent in "asking" a query. A dozen posts ago I delved into Interrogative Logic and mentioned that you could test program validity with it by determining (through other means) whether the answer you got from YOUR program matched what you expected - and could demonstrate that what you expected was reasonable. Since SQL is based on set theory and set theory has the concept of a null set, I don't think we can really remove NULL. What we MUST do is learn to treat it with respect.

doesn't mean NULLs has been assigned a meaning, since I could correctly determine attendances from records with a checkbox.

He's got it. By George, I think he's got it.

The_Doc_Man · Jul 11, 2007

experience is recognizing the mistakes next time you make it

With apologies to Ambrose Bierce, who originated that pithy little comment. The late (we presume) Mr. Bierce should be findable through Google, including a publication known as the "Devil's Dictionary." I say "presume" because he disappeared one day and, AFAIK, his body was never found.

Banana · Jul 12, 2007

The_Doc_Man said:
When NULL is present, you do not HAVE the closed world assumption. When you have the closed world assumption, you do not have NULL. So if you are in the closed world assumption, the question is moot.

Obviously.

Then why they say that a ideal database should adopt a closed world assumption; that it should have all data that it will ever need for its purpose?

Since SQL is based on set theory and set theory has the concept of a null set, I don't think we can really remove NULL. What we MUST do is learn to treat it with respect.

So let's see if I'm hearing you right. Even if the Dr. Codd and other theorists who has their heads (and perhaps their ego

) up in the sky insists that a database be a closed-world model, it is not compatible when we're basing SQL on set theory, which necessitates a need to represent a empty set, which leads us to null... correct?

Banana · Jul 12, 2007

At risk of derailing the thread, I thought up two concepts relating to people/roles I want to address (perhaps dispel?)

First, I realized that if you look at Access's users model which is also similar in other Office programs such as Outlook, and at DAO's data model, there's something that seems to me denormalized; Users and Groups.

In DAO Data model, there's two separate branches; 1) Users and Group, 2) Groups and User. (I think the plurality here is deliberate).

Wouldn't that be denormalized or is there a special reason for this?

Another concept is the actual users of the database. We usually want to be able to record who entered what records, so we have a small user table with maybe some related tables for error handling, security, or whatever.

Now, suppose that in the database, we had the people/roles for everyone within the model of database (e.g. employees), and had a separate user table when it easily could be a entry in the role table and an actual part of the database model, would this be preferred, even though the user table (to my mind at least) seems to exist in a entirely different world than the world that database is trying to represent.

Nulls: Should it ever be meaningful?

split with a cherry atop.

Immoderate Moderator

split with a cherry atop.

split with a cherry atop.

Immoderate Moderator

split with a cherry atop.

Immoderate Moderator

split with a cherry atop.

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

split with a cherry atop.

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

split with a cherry atop.

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

split with a cherry atop.

Similar threads

Users who are viewing this thread