Nulls: Should it ever be meaningful?

Banana · Jul 9, 2007

One design problem I've realized is how I want Access to tell me whether there's no given record in a junction table, which the customary answer would be a null.

For a need of concrete example, say I want to track attendance of any given person for a given session. I can then create a junction table consisting of PersonID and SessionID. But do I want to have an extra column indicating whether this person did in fact attended?

If I didn't, I can use nulls to conclude that the person skipped a session. However, if the attendance was changed (let's say that someone has been calling someone else's name during roll and later caught), that means delete query is needed so the attendance check query will now return null for this person, indicating absence. So that means I now have the issue of correcting maintaining the junction table so whenever I look for a null, I know that I won't be looking at stale record that shouldn't been there in first place.

Or I can just add an Yes/No Column of whether a given person did attend a session. Instead of deleting records, it's now a matter of clearing the checkbox if the data changes. There's now no need to delete records, so I can just keep adding records as I like. In this situation, it's actually simpler to add somehow extraneous column than use Null-logic.

Now, I'm sure there are other situations where making a record just to deny something makes no sense, and leaving it as null would be appropriate.

If you read this interesting Wiki article on Nulls, especially the section on criticism thereof, you can see that there has been suggestions to omit nulls entirely from the SQL logic.

So, what are your feelings on this matter? Should Nulls be relied on to be meaningful, and under what circumstances?

boblarson · Jul 9, 2007

Search for posts by The_Doc_Man regarding Nulls as he has some good information about that.

Here's a thread that has some good stuff by him:

http://www.access-programmers.co.uk/forums/showthread.php?t=130006&highlight=null

The_Doc_Man · Jul 9, 2007

{best Jack Nicholson imitation}Hello, folks, I'm baaa-aaack... {/best Jack Nicholson imitation}

I read the article on Wikipedia. My background includes a PhD with lots more math than I wanted to know, and I see where NULL in SQL (and the references to articles by Dr. Codd) have been clearly explained. I'm going to go more techie than I think I've ever gone here.

First, Dr. Codd is clearly thinking that NULL and INFINITY are similar in meaning. He suggests a multiplicity of nulls, just as there are many kinds of infinity. Most of us never see more than Aleph-0, the "garden variety" of infinity, for which you say "This number is too big to be expressed." But there are other types of infinity, depending on the method you used to arrive there. For instance, there is an infinity that says "increases without limit" and that is a different type of infinity than "too large to express" - for the latter implies a specific value exists, but we can't know it. The former says there is no value because there is no limit.

The arguments about the "flavors" of NULL have a problem in my book. The moment you say null has meaning - such as "Applicable but absent" vs. "Not Applicable" - you have turned null into something that is NOT based on the hard - and very specific - meaning of NULL in set theory. In the latter theory, NULL means the same thing as "empty set" - and it takes the same exact meaning whether you are doing simple sets, joined sets, unioned sets, or any other kind of set you can imagine. NULL = no members in the set. The population of the set is 0. More specifically for JOINS, it means that the INTERSECTION of the joining sets is empty.

Where I go bonkers is that people have a loose understanding of what NULL really means. It means one and only one thing in Access or any other form of SQL that abides by the original set theory meaning. It means - whatever the set is that we are examining, we have found no members.

My own view is the Murphy's law view. There are known problems with the use of NULL in a query or other Access function, action, VBA sequence, or whatever else you've got. If you know ahead of time that you have at least a decent chance of seeing a NULL and you don't protect yourself against the negative consequences of that situation, you have screwed up. I cannot be plainer or more direct than that.

A part of the article intrigued me because there is some discussion about what NULL does to SQL. It resembles what happens in Automata theory when you allow NULL transitions. It also impinges on the Goedel Completeness Theorem, which states (loosely) that a language can be correct or complete, but not both. Interrogative Logic (IL) theory is the basis for queries. In essense, IL can be used to show when a query correctly answers the question implied with it.

In the Wikipedia article, there is a reference to Closed World vs. Open World. This is the same in effect as the Goedel assertion or the basis of IL theory. A query return is TRUE in IL when it completely returns every set member implied by its conditions, and it is FALSE if it leaves things out. You can guess that a query language author can use IL theory to validate the SQL implementation. Well, when you introduce NULL in the equation, the problem becomes quite simple. NULL, like INFINITY, cannot be fully characterized because it represents a DEPARTURE from normal numbering or enumerating systems. NEITHER ONE can be exactly expressed.

Goedel's Theorem explored the "language" of finite automata and the way in which the quality of "finite" could be removed. At the risk of a terrible pun in this context, the NULL transition nullifies the "finite" attribute of any automaton. This is because NULL, being non-existent, cannot be counted or evaluated. Allowing automata to interact with input streams to exert a NULL TRANSITION instantly turns the automaton into a non-finite automaton. In practical terms, what happens isn't that the automaton goes into an infinite loop. It REALLY means that you can no longer decide whether the automaton will exit. Because, you see, Goedel was working on COMPUTABILITY theory among his other interests. The bit about automata is because many elegant proofs about formulas were made using Finite Automata theory to generalize the math. You could probably Google-search for FINITE AUTOMATA and get a lot more than you really wanted to see.

Where has this taken us? SQL has the same problem that Goedel found with other languages - including English. He found that English was COMPLETE therefore cannot be CORRECT. The goal of SQL is that all returns from it can be trusted. When you allow NULL, that is no longer true because NULL is outside of normal logic used for set theory. In case you are wondering why English is COMPLETE but not CORRECT, it is because English allows something called a self-referential statement AND it allows expressions to have truth value in ways that can be very confusing. Example: "This statement is FALSE." Tell me what that means. Go ahead, I dare you. I double-dog dare you.

SQL with poor NULL management becomes a nightmare. SQL with proper NULL management avoids the nightmare. Mucking about with NULL so that there are flavors of NULL (it's in the Wikipedia article) compound the nightmare. All NULL should EVER mean is "have no way of knowing in this context." If you make more of a statement than that, you are going far afield.

I'm gonna break here and think some more. My brain cramped on me and I need to let it cool down.

The_Doc_Man · Jul 9, 2007

I lied a little. I came back sooner.

In this forum, questions are often raised about the use of NULL. I tend to say that there is no excuse for leaving a NULL behind in a record once it is created. You can, if you wish, leave a state-value in the record such that you can define whether the record is ready for use. When you run into an outer join that has the unfortunate effect of leaving half a record NULL, that is where Access gets crazy. But if you check the status of the record and find that it contains nulls in places that just can't be null, you have shown that the outer join resulted in a empty INTERSECTION.

The question is what you should do with records resulting from outer joins. If you would have also gotten the record on an INNER join, you can process it with little or no problem. But if the record would NOT have been returned after an INNER JOIN, you have to decide what you allow to be done. To my personal viewpoint, if you find that you have a record that contains NULL in critical parts, you have something that must be handled very differently than other records. What you cannot do is let the NULL trip you and your code. If you do, you have no way of proving the quality of your algorithm any more.

Banana · Jul 9, 2007

BobLarson: Thanks for the link. I didn't realize that the thread broached on the topic of Nulls, but now I know a bit. Thanks again!

The_Doc_Man: Great post. Based on what I'm reading, you're saying that Nulls should strictly be meaingless (well, almost. Read below), and database design that haphazardly allows nulls there and here will ultimately be a pain in ass and head. This reinforces what I've observed in my OP. However, there's a bit of gotcha.

When I think about it, my solution of adding an column did not make Null meaningless. On the contrary, I just made it unnecessary to delete records. Null still does have a meaning in this context; suppose I want to look up a given person's attendance for a session, where I have a existing record of the attendee and the existing record of session, but have yet to create a junction record signifying the attendance (or lack thereof). Therefore, if I get a Null in return, I can then conclude that either this person never attended or the operator failed to create the required junction record. The latter is a question of how well I have designed my forms. Let's say I have a complete listing of every attendee so the operator can check off during a roll call for a given session so there's no skipping or whatever, I could claim that I've minimized the chance of accidental omission sufficiently that the former situation (that the attendee did play hooky) is likely more true.

Therefore, my query to investigate absents would need to look up all junction records with empty checkbox for attendee *AND* Null records.

Of course, if I really didn't want to use Nulls, I could just create new set of junction records for every new session with the attendees registered, with the checkbox initially empty. But that creates new problem; I now have created extraneous maintenance- Whenever the registration list changes, I need to update the junction table every time, including deletions of all junction record for deleted attendee record. Even with Cascade Delete, this represents a unnecessary extra steps to maintain the data. (Does that also follows a greater potential of corrupting data?)

So, I'm seeing this as a "Damned if I do, damned if I don't" situation, and feel that the solution of adding just new junction tables with the checkbox column while tolerating Nulls for records not yet created is probably the most preferred compared to other solutions. Even then, it seems to me that solution makes Nulls meaningful.

WRT the "Closed World Assumption", while I agree in principle that a database should model a closed world assumption. Otherwise, it'd act too much like human- "Why are you asking me so many questions? I've told you before and I'll tell you again, I don't FRICKIN' know the answer!" It's easier to have the database return False and let developer sort it out what exactly is False. After all, it's their job to design the database and model it so every False it could return can be explained in itself. BUT! I'm not completely convinced that it's okay to declare everything unknown as false. It's one thing to declare a lack of record as false, but I strongly doubt you can do the same with a record that's yet to be created. You cannot declare falsity (and for that matter, truth) on something that has yet to exist whether in actuality or imagined.

So, I do feel that Nulls do a valuable service in telling me what is missing as opposed to getting False for both lack of records and for records returning False results. In that sense, there can be just one type of Null, and that's all we really need. Like Doc Man, I'm not sold on the idea of introducing Dr. Codd's idea of using "Missing but Applicable" and "Missing but not applicable", let alone the proposal that we would need 11 different types of Nulls!

eek:

)

Now, here's another scenario to consider. Suppose we have a database that stores about people working for Company X. Of course, the Employee table would have many side tables about people's personal data (addresses, contact informations, blah blah). Now suppose further that we need to store data about different types of people, and each different types has its own set of data that does not apply to another type. Therefore, there's three One-One tables representing each type:

tblEmployee
EmployeeID
FirstName
LastName
....

tblManager
EmployeeID (1-1)
HireFire (Company X does not give out the authority to hire or fire to all managers, so we need a checkbox to tell who has the authority)
PromoteDemote (Ditto)
BudgetAuthorization (Ditto)
....

tblContracts
EmployeeID (1-1)
ContractorGrade (Company X wants categorize contractors based on their experience which impacts what contracts Company X may offer out)
Bonded (Company X is a cheapskate and instead that contractor covers their own ass whenever possible)
....

Now if we have a new hire, we just add the entry in Employee table and the appropriate record in 1-1 child table. But what about existing hires? What if we promote an employee to manager or make a contract a permanent worker? Assuming that Company X's database developer actually knows something, it should be no brainer for the HR to create a new record in the child table that now indicates the new employment.

Note that there is no need (and no good reason) to create a field in tblEmployee identifying the roles, as the presence of record in related child table is in itself sufficient to identify a given employee's role. This is true even if we allow for possibilities of having multiple roles at one time for one employees, or having a history of past roles stored or however the data may be structured.

And here's the catch: the Null now becomes meaningful. Absence in tblManager now tells us that this employee is not a manager and so on. In fact, this is how you would query to ascertain whether a given employee occupy what roles, and using Null to identify what roles employee doesn't occupy.

Have we broken the boundary of Null as defined by Set theory here?

The_Doc_Man · Jul 9, 2007

Perhaps. Perhaps not. It is a matter of perception and intent.

As I have ALWAYS maintained on this topic, if you take the trouble to intercept NULL when it wants to smack you in the face; if you use NULL properly to sense that you have no members in the set; if you use NULL to signal that you can't ask any other questions about the set, I have no heartache at all. It is this situation where folks fall down in a sea of nullity.

If you have a list of managers and you do a query that returns the list of all managers with last name Mxyzptlk, and the query's count is 0, then DON'T ASK THE AVERAGE SALARY OF ALL MATCHING MANAGERS. Because if the set is empty, your inputs to the formula are all NULL. The query is appropriate - but asking a follow-up question is not. That is the crux of handling NULL in SQL, VBA, or on forms. Your NULL check should be a barrier to further investigation of the data returned by a given query. And all too often, folks try to make it have another meaning.

I am in the camp that says NEVER assign a meaning to NULL other than its natural meaning. I am also in the camp that lives according to the rule that says "Don't ask a question if there are answers you didn't want to hear." (NULL being one of those answers.) It is why, in some posts, I have openly shown my colors as a dyed-in-the-wool pragmatist.

So, Banana, I'm not telling you what to do in your app. I'm just saying that if, knowing what you know from this and other posts, if you get bit by NULL later, I'll have little sympathy unless the NULL was one you had no reason to expect.

By the way, there is a concept I should interject. If a person used to be a grunt and is now a manager, I'm of the opinion that you should have both records available, but the one that is out of date should have a contemporary end date and the one that is current should have a bogus end date such as 31 Dec 2099. Then when you add a new record to update the employee's current status, you update the end date of the current assignment at the same time you make the new assignment. Let your "current status" query select only those records with end dates that are greater than NOW().

I think that way because in the government services, you NEVER discard data. You mark it "out of date" and store it in your history tables but you NEVER delete it. So I come by that quirk honestly.

And now, a direct answer to your ending question. I wouldn't do it quite that way simply because I never want to see nulls in a record that has been created for an existing person.

I would make a table of employees and have fields that link to attribute tables. Being a pragmatist, I would put the person's current status in the main record and have a history table showing prior status info. The roles can come from a one/many role table (one employee, many possible roles). If you prefer, also a one/many sklls table. And a one/many history table. And a many/one employer table, if you have contractors whom you can count as resources for a given project. Many resources can work for any one contracting company. (I live in that environment 5 days a week.) Your 3-way split is going to be a headache SOMEWHERE down the road because it invites nulls. The way I see it, everyone will have some skills, some roles, and an employer, so there will never be any nulles anywhere. Unless, of course, the manager has pointy hair and a total null in the "skills" department. But then, I'm a fan of the Dilbert comic strip. I'd SWEAR that the author of that strip worked in MY office. Holy sh|t is he ever on target.

The_Doc_Man · Jul 9, 2007

Separate comment: 11 different types of NULL is crap. Make ALL of the implied statuses into a RADIO button on a form. Then if you want 256 different types of nullity, you've got 'em - but nothing will blow up because you have customized the treatment of NULL for the table in question and can leave true NULL to have it's true meaning. ("Empty set, no further questions about the set have meaning.")

Banana · Jul 9, 2007

The_Doc_Man, thanks for narrowing the scope here. I like your description about the refusing to do follow up questions when getting null in first query.

The_Doc_Man said:
By the way, there is a concept I should interject. If a person used to be a grunt and is now a manager, I'm of the opinion that you should have both records available, but the one that is out of date should have a contemporary end date and the one that is current should have a bogus end date such as 31 Dec 2099. Then when you add a new record to update the employee's current status, you update the end date of the current assignment at the same time you make the new assignment. Let your "current status" query select only those records with end dates that are greater than NOW().

In proposing my 3 way split idea last night, I forgot one minor point. This is to allow one to have multiple roles *and* answer questions specific to each roles which would be otherwise extraneous if the role is not occupied. More below.

And now, a direct answer to your ending question. I wouldn't do it quite that way simply because I never want to see nulls in a record that has been created for an existing person.

Even if there is a complete record in the top-level table (tblEmployee)? We don't complain if there is missing child records, for which we can query and get null in return. (A missing parent record, on other hand.... "Bad dog! Bad dog! You are not supposed to poop here!"). We can just infer from absence of child record that parent record just doesn't apply to whatever the child record represented and if I'm understanding you right, makes an acceptable use of Nulls (especially if we stop right there and go no further.)

I would make a table of employees and have fields that link to attribute tables. Being a pragmatist, I would put the person's current status in the main record and have a history table showing prior status info. The roles can come from a one/many role table (one employee, many possible roles). If you prefer, also a one/many sklls table. And a one/many history table. And a many/one employer table, if you have contractors whom you can count as resources for a given project.

Your proposal is solid when we want to maintain history of roles, and if we don't have any questions to ask specifically about each roles. But what if we need to know something about a role for a given person?

Normally, this kind of data, being dependent on both employee AND role would be best suited for a junction table between employee and role. Except for one thing, we have more than one roles, and each role has different set of questions to ask. If we're going to ask all questions in junction table, we could just have done the same in the employee table.

So how do we handle several different sets of questions that is specific to each roles that needs to be answered for each employee whatever his role may be, *and* not make nulls meaningful?

EDIT: I should make explicit an implicit assumption I've made here for sake of clarity. By "sets of questions", I assume that they are best suited as a fields, rather than records, as the sets ask something about one's attribute, rather than define what the attribute is. This then follows that it cannot be put in junction tables and used to represent data specific to any given roles. If the assumption is mistaken, call it out.

Unless, of course, the manager has pointy hair and a total null in the "skills" department. But then, I'm a fan of the Dilbert comic strip. I'd SWEAR that the author of that strip worked in MY office. Holy sh|t is he ever on target.

EDIT 2: In this wonderful discussion, there's a small loose end. In my earlier post, I discussed about using an extra column in junction table to count attendance:

Therefore, my query to investigate absents would need to look up all junction records with empty checkbox for attendee *AND* Null records.

As explained in the same post, this ultimately was an act of balancing between maintaining the junction table (I personally believe that you should never need ever to maintain a table; you should be able able to add new record and update records. But if you need to delete records just to make everything work correctly, then your design has to be out of whack) and tolerating Nulls. That said, did that also violate the definition of Null, and have I created a unneeded step in my query?

KeithG · Jul 9, 2007

Great replies about NULL's DocMan! You are the man!

The_Doc_Man · Jul 9, 2007

I personally believe that you should never need ever to maintain a table; you should be able able to add new record and update records. But if you need to delete records just to make everything work correctly, then your design has to be out of whack

I agree with the above sentiment, with one exception.

In an inventory/transaction system, you would eventually wish to archive records and replace with one for each separate part number, with a "summary inventory on hand as of date x/y/z". Then append the records to the archive table or external file; and finally delete them from the live database. This is a performance issue and a space issue. Archiving to an offline medium or just to another file protects you from corruption and speed issues, particularly on a network-based (shared) DB. THERE, you should have to delete records.

Otherwise, I tend to agree with you.

Keith, thanks for the vote of confidence.

The_Doc_Man · Jul 9, 2007

Banana, to allow multiple roles, there is such a thing as a junction table to a roles list, then implement a many-to-many type of join, one person has one entry per role in the junction table. End of story.

As to "especially if we stop right there and go no further" - there is no "further" to go in the case you named. HOWEVER - if there really IS farther to go (case 1: Does not apply, case 2: applies but no data), your solution is wrong. NULL does not support either of these interpretations - it only supports case 3: don't know if it applies but either way, no data.

So how do we handle several different sets of questions that is specific to each roles that needs to be answered for each employee whatever his role may be, *and* not make nulls meaningful?

Associate them with the junction table between employee and role. Give the junction table its own unique key and have a question list beneath the junction table entry. That's as specific as it gets.

rainman89 · Jul 9, 2007

my head just exploded......

Banana · Jul 9, 2007

The_Doc_Man said:
Associate them with the junction table between employee and role. Give the junction table its own unique key and have a question list beneath the junction table entry. That's as specific as it gets.

Ok- I'm not sure what you mean by "beneath junction table".

If all questions for all roles are stored in the junction table, this does not solve the problem of having several empty columns because they do not apply to this employee.

If questions are stored in a separate table that is related to junction table, doesn't this still necessitates a 3-way split as described earlier?

Some more elaboration is needed, please.

WRT never needing to delete records, it looks like we're pretty agreed. I agree with the exception of inventory/transaction where performance can become an issue. Should it then follow that the query checking for both empty checkbox and Null record is appropriate for inferring absents?

And psst, you rock, The_Doc_Man!

The_Doc_Man · Jul 9, 2007

I'm not completely convinced that it's okay to declare everything unknown as false.

Banana, I'm not saying that. What I'm saying is that IF you expect nulls, the correct way to avoid them is to include a state-flag in the record that tells you - record is not complete. And if the REASON why it's not complete becomes important to you, you have a state-flag with LOTS of possible state codes you could use. Like - no data, or not applicable, or brain fart by data entry clerk, or... whatever you have. The point being that if you NEED to guess why you had the NULL, your design is incomplete.

This is one of the Old Programmer's Rules in action: Access will never tell you anything you didn't tell it first. If you need to know why you got a null, you need to store the reason code that says, "This record contains nulls because..." and fill in the blanks according to the reasons you envisioned. Then and ONLY then will Access be able to say "Not applicable" or "Not available" or "Data was originally entered by thumb-fingered idiot."

I'm trying to make this point clear and I still sense an undercurrent in your posts. NULL CONVEYS NOTHINGNESS. No data. No reason WHY no data. No description of what is absent. In a rigorous application, if this reason is important, you cannot trust it to NULL. NULL is like the old Lassie TV series, where the dog barks and wags it tail but you never know quite why unless you follow it somewhere else.

The_Doc_Man · Jul 9, 2007

If all questions for all roles are stored in the junction table, this does not solve the problem of having several empty columns because they do not apply to this employee.

What problem? What columns? Whenever you say "questions" and then talk about columns, that very dangerously impinges on "repeating groups" - which is a big normalization no-no.

The questions might or might not apply to a given employee. BUT the only way to know is to ask the questions and record the answers.

If you can imagine this:

Role table - describes roles. As many fields as it takes.

Question table - provides the questions associated with a role. Has a question ID and whatever text is required.

Role Question table - LISTS the questions (by Question ID) that apply to a role (by Role ID). Might be as short as 2 ID fields. If the question does not apply to that role, it ain't in the list to begin with.

Employee table. Describes employees. As many fields as it needs.

Employee Roles table. LISTS roles that apply to this employee. Links to employee by ID. Links to Roles by ID. Give this table a unique index of its own - trust me for about two more descriptions. If an employee has more than one role, you get more than one entry. Might be as short as 3 ID fields.

Employee Roles Answers table. LISTS answers to all questions applicable to a role specific to this employee - INCLUDING "NOT APPLICABLE" when applicable.

Links to employee roles table by the index on the Employee Roles table and links to the question ID by the ID in the questions table. Might be as short as Employee ID, Role ID, Question ID, text of answer. OK, let's add a state code that can mean "Not applicable" or "no data available" or "Employee pleads 5th amendment" or whatever is appropriate.

No nulls whatsoever. Now, you might have some EMPTY answers - but they are not null. The key? Ask every question in the Roles table's question list. If the question is general and is repeated for given roles, you can even tell if you've asked the question before for the same employee in a different role and gotten an answer already. Then just duplicate the answer or make another reference to the answer or find some clever way to deal with it.

Banana · Jul 9, 2007

The_Doc_Man said:
I'm trying to make this point clear and I still sense an undercurrent in your posts. NULL CONVEYS NOTHINGNESS. No data. No reason WHY no data. No description of what is absent. In a rigorous application, if this reason is important, you cannot trust it to NULL. NULL is like the old Lassie TV series, where the dog barks and wags it tail but you never know quite why unless you follow it somewhere else.

Maybe this is because I'm not convinced that tertiary logic exists on its own. If you consider this truth table:

(Credits to Wiki article on Ternary Logic)

Now, we have "TRUE OR NULL = TRUE", which the article reasons that even if we don't know what NULL represents, we can see that even if it was later known to be false, "TRUE OR FALSE = TRUE", and still returns the same answer if the NULL was later shown to be true.

Now, something is wrong here. We've been using binary logic all along! Ternary Logic is nothing more but two-level binary level compressed in one step. Here's a tree to exemplifies how binary logic can still handle Nulls:

Code:

                  Do we have knowledge?
                   /                  \
                True                  False
                /                         \  
          Is it true?                    NULL 
           /       \
        True      False

A tangent: This is a reason why we're still using binary as machine code- manufacturers, at least for time being has decided it was cost-effective to double the computing power and amount of data storage every 18 months than to add more states for same amount of computing power or data storage. Everything that can be done in multi-state can actually be simplified down to binary calculation; it may take a bit more time or processing but it's doable. Furthermore, it's actually simpler to do calculations in binary than in ternary, and let's not think about doing calculations with 11 states!

So, there's a chance that people who are calling for no NULLS in SQL are right; and (I hope I'm not putting word in your mouth) like you said, if we wanted null to mean something, it is ultimately our job to give it a meaning by creating the appropriate tree... Correct?

Banana · Jul 9, 2007

The_Doc_Man said:
What problem? What columns? Whenever you say "questions" and then talk about columns, that very dangerously impinges on "repeating groups" - which is a big normalization no-no.

The questions might or might not apply to a given employee. BUT the only way to know is to ask the questions and record the answers.

There are no repeating questions for any roles; each question are unique to one role only. As I said earlier, I had assumed that question should be a field, not a record because they ask something about a role's attribute but doesn't state what it is.

If you can imagine this:

Role table - describes roles. As many fields as it takes.

Question table - provides the questions associated with a role. Has a question ID and whatever text is required. In my earlier post, I gave some examples, such as asking whether a given manager has authority to hire and fire grunts or not. This seems to me more like a field as it asks something about entity, and does not have any description for that entity.

Role Question table - LISTS the questions (by Question ID) that apply to a role (by Role ID). Might be as short as 2 ID fields. If the question does not apply to that role, it ain't in the list to begin with.

Employee table. Describes employees. As many fields as it needs.

Employee Roles table. LISTS roles that apply to this employee. Links to employee by ID. Links to Roles by ID. Give this table a unique index of its own - trust me for about two more descriptions. If an employee has more than one role, you get more than one entry. Might be as short as 3 ID fields.

Employee Roles Answers table. LISTS answers to all questions applicable to a role specific to this employee - INCLUDING "NOT APPLICABLE" when applicable. Links to employee roles table by the index on the Employee Roles table and links to the question ID by the ID in the questions table. Might be as short as Employee ID, Role ID, Question ID, text of answer. OK, let's add a state code that can mean "Not applicable" or "no data available" or "Employee pleads 5th amendment" or whatever is appropriate.

Now, you've made questions a record, not a field. As I stated earlier, I was locked in thinking that questions should be fields. This comes with the benefit that I can validate the answer or restrict how it may be answered. With the idea of answer junction table, I'd have to figure out what question is being asked, then select a appropriate data type to store the answer in, enforce relevent validation rules... Is that really necessary?

No nulls whatsoever. Now, you might have some EMPTY answers - but they are not null. The key? Ask every question in the Roles table's question list. If the question is general and is repeated for given roles, you can even tell if you've asked the question before for the same employee in a different role and gotten an answer already. Then just duplicate the answer or make another reference to the answer or find some clever way to deal with it.

If the structure is indeed correct, that's great to know that we never will need to deal with Nulls as meaningful. Of course there's this little loose end still hanging in another scenario I described earlier; if I'm looking up for absents by using both empty checkbox in junction table and null records, does that mean the design is out of whack as well?

The_Doc_Man · Jul 9, 2007

if we wanted null to mean something, it is ultimately our job to give it a meaning by creating the appropriate tree... Correct?

I think that requires a not-too-hesitant NO. NULL already has a meaning assigned by the nature of the math that underlies set theory. Giving NULL a meaning not consistent with the set theory definition is VERY dangerous. See, normally I'd take the pragmatist view here. But too many years and too many math courses too long ago have made me gunshy of reinventing wheels. Which "assigning a meaning to NULL" would definitely represent.

There are no repeating questions for any roles

I was locked in thinking that questions should be fields

If this leads to the concept of fields named QUESTION1, QUESTION2, etc., then you have what is called a repeating group. If you have 10 questions, are you going to have QUESTION1..QUESTION10 ? If in the same list of roles you only have 2 questions for a different role, haven't you wasted the slots normally occupied by QUESTION3..QUESTION10 ?

By making the questions separate, but an enumerable list, you avoid both of these problems. Your image is still one of flat-file thinking.

This comes with the benefit that I can validate the answer or restrict how it may be answered.

You could do that already with the list concept. If two questions differ only by the answers they condone, are they different questions? In Interrogative Logic theory, there is at least precedent for saying YES they are different. If you accept that theory, then you duplicate questions when the condoned answers differ, but can still validate answers by having a list of answers condoned by each question. (Heck, that way, you make ANSWERS into a drop-down unless you have a fill-in-the-blank case...)

if I'm looking up for absents by using both empty checkbox in junction table and null records, does that mean the design is out of whack as well?

suppose I want to look up a given person's attendance for a session, where I have a existing record of the attendee and the existing record of session, but have yet to create a junction record signifying the attendance (or lack thereof).

Again, the question is how you represent absenteeism. If you do so with a check-box for attendance, then why would you not set a default value of "FALSE" (meaning absent) and only count a person as attending when the box is checked? In this scenario, there is no join. You do a DCount of TRUE checkboxes and if you get 0, your person didn't attend.

Banana · Jul 9, 2007

The_Doc_Man said:
if we wanted null to mean something, it is ultimately our job to give it a meaning by creating the appropriate tree... Correct?

Click to expand...

I think that requires a not-too-hesitant NO. NULL already has a meaning assigned by the nature of the math that underlies set theory. Giving NULL a meaning not consistent with the set theory definition is VERY dangerous. See, normally I'd take the pragmatist view here. But too many years and too many math courses too long ago have made me gunshy of reinventing wheels. Which "assigning a meaning to NULL" would definitely represent.

Doh. Now that I've read my quote, I see that I've expressed myself too loosely. What I was referring to "it" in "give it a meaning" was to those states we discussed earlier (e.g. Not Applicable, Not Reported, etc...), and never use Null to mean those states by creating the needed states.

If this leads to the concept of fields named QUESTION1, QUESTION2, etc., then you have what is called a repeating group. If you have 10 questions, are you going to have QUESTION1..QUESTION10 ? If in the same list of roles you only have 2 questions for a different role, haven't you wasted the slots normally occupied by QUESTION3..QUESTION10 ?

By making the questions separate, but an enumerable list, you avoid both of these problems. Your image is still one of flat-file thinking.

No, no, no. If you look earlier at my example, there is nothing repeating; each question are *unique* to each roles.

Code:

tblManager
EmployeeID (1-1)
**HireFire (Company X does not give out the authority to hire or fire to all managers, so we need a checkbox to tell who has the authority)
**PromoteDemote (Ditto)
**BudgetAuthorization (Ditto)
....

tblContracts
EmployeeID (1-1)
**ContractorGrade (Company X wants categorize contractors based on their experience which impacts what contracts Company X may offer out)
**Bonded (Company X is a cheapskate and instead that contractor covers their own ass whenever possible)
....

I've marked the fields with ** for what I've been calling questions. As you can see, each field are specific to the role, and would be nonsensical if asked of other roles.

Of course, if there were question that was common to more than one role, then a Question table with RoleQuestion junction table is needed. However, this is not what I'm thinking of.

Now, you say-

You could do that already with the list concept. If two questions differ only by the answers they condone, are they different questions? In Interrogative Logic theory, there is at least precedent for saying YES they are different. If you accept that theory, then you duplicate questions when the condoned answers differ, but can still validate answers by having a list of answers condoned by each question. (Heck, that way, you make ANSWERS into a drop-down unless you have a fill-in-the-blank case...)

Does that still mean that even if the questions are unique to each roles, it still makes sense to put it in a separate table and use a junction table to record the answer to each question for any given employee?

I'm still stuck as I've come to think those questions as more fitting as fields rather than records because as I said earlier, they ask something about a record's attribute and has nothing to do with the record itself and say nothing about the record itself.

Again, the question is how you represent absenteeism. If you do so with a check-box for attendance, then why would you not set a default value of "FALSE" (meaning absent) and only count a person as attending when the box is checked? In this scenario, there is no join. You do a DCount of TRUE checkboxes and if you get 0, your person didn't attend.

Ah, but doing so would mean I'm just needlessly making records "just in case." As I described earlier, it means for every new session I make, I need to create a set of blank records for all attendees registered for this session. This now introduces table maintenance not to mention that if the registration list changes, I have to take extra steps to ensure everything is accounted for. Wouldn't that be poor design as well?

The_Doc_Man · Jul 10, 2007

Does that still mean that even if the questions are unique to each roles, it still makes sense to put it in a separate table and use a junction table to record the answer to each question for any given employee?

I'm still stuck as I've come to think those questions as more fitting as fields rather than records because as I said earlier, they ask something about a record's attribute and has nothing to do with the record itself and say nothing about the record itself.

I guess I didn't look as hard at your "questions" as I should have, but this still bothers me, just no longer at a fully conscious level. (But then, at this time of night, "fully conscious" might be asking a bit much.) Back to the issue - the nagging feeling I have probably means there is an unresolved issue in my mind that could merely be a prejudice or it could be a bigger issue. I just don't know.

Perhaps the problem is that every role potentially has different "questions" to be answered - even if they represent attributes of the role. This makes role unification nigh unto impossible when these "questions" are part of the role tables. I am no expert on the higher levels of normalization, but I believe that by folding the attributes into the role tables and allowing the roles to be so different as to be incompatible, you have exposed a denormalization of a very subtle type.

It is trivial to unify roles when they are part of a list associated with a role. You just about can't do any unification if not. And I'm in favor of coalescing those three parallel tables to one table (of unified roles) and one child table (of role attributes that you call "questions.") Am I making sense that way?

You were forced to split your employees table into parts because you could not unify the roles. But the question I have is whether increasing the virtual depth of the tables (to have lists of attributes underneath) would allow you to narrow the three-table layer to a single table, thus unifying everything and yet retaining generality.

Ah, but doing so would mean I'm just needlessly making records "just in case."

We are talking past each other. You make NO RECORD AT ALL if there is no attendance. Then you don't JOIN anything. You just count records matching the person and class from the attendance table. This is a JUNCTION table, perhaps - but you don't have to exercise a JOIN to get there. Just a DCount. If the DCount of the person/class records is zero, you know what your attendance was like. End of question. Not a null - because DCount is one of the rare Domain Aggregates that does empty sets correctly.

Nulls: Should it ever be meaningful?

split with a cherry atop.

Smeghead

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

AWF VIP

Immoderate Moderator

Immoderate Moderator

I cant find the any key..

split with a cherry atop.

Immoderate Moderator

Immoderate Moderator

split with a cherry atop.

split with a cherry atop.

Immoderate Moderator

split with a cherry atop.

Immoderate Moderator

Similar threads

Users who are viewing this thread