Analyzing complex relationships: How to?

Banana

split with a cherry atop.
Local time
Today, 01:34
Joined
Sep 1, 2005
Messages
6,318
While working on the database, I worry about crazy new ideas infecting my brain which can lead to bad habits which can lead to general insanity. Thus, I have this insatiable urge to blah out my crazy ideas so everyone can make fun of me and strip me of self-respect so I may learn one thing or two. :eek:

The ideas mainly relate to how I make decisions on how to relate groups of entities. Mind you, this is not about "Is this a one-many relationship or a many-many relationship" but more about the overall picture consisting of complex relationships.

So, I'll list what I do use to base my decisions, in no particular order, and if it's half-baked, maybe everyone here can help me bake it completely. Also, I'd love it if anyone else shared their ideas.

Child Table Inheritance

Child tables always inherit parent tables' record entries. This can be proved by executing a query pulling child record using its ID, and including the parent's record.

Does it then follows that when considering which child table should be made a junction table, one can ask whether we want this child to inherit the 'other' parent table's properties to determine which table we need to convert to a junction table?

If a table needs to be a three way junction table, it means it has three parent table to inherit properties from.

Determining inheritance in reverse manner is also a good way to identify the relationship. The_Doc_Man already has made the suggestion of doing a trace to identify how objects are related. I usually start with a paper listing every questions I need to know about a given entity. I then ask myself if this question can be answered by itself. If yes, there is no relationship. But if I say, "Well, I need to know this...", then there's a relationship to be traced, because this question need to inherit that extra information.

PS Forgive me if using the term "inheritance" is odd; it was the one that made most sense to me, coming from a object-oriented programming perspective.

Auxillary junction tables

Suppose we need to create a three way junction tables to describe relationship among three entities that are independent in each own rights. Does it then follow that we need a two way junction table for each pair of parent, a total of three two way junction to complete the relationship?

It seems that if you need to track how a pair of parent may be related, and may need some extra information that is fully dependent on this pair and never for the third entity, then yes, you need that two-way junction table. Another scenario is if you need to enumerate which specific instances of entity can go with another entity, as not all X may be related to all Y or something, necessitating a two way junction table to help identify which instance of X does have such relationship. Otherwise, it is implicitly referred to within the three-way junction table. Is that right?

Preventing Circular References

For a long time, I used to think that if when looking at relationship layout, I could trace starting at a table and end up back at the same starting table, I had a circular reference. However, this seems to not be always the case, as it is possible to have X depend on Y, while Y depends on Z and the pair of X and Z forms a meaningful relationship. This is not same thing as saying "X depends on Y, while Y depends on Z, while Z depends on X", which would be truly circular. Same can be said of self-join, which is self-referential. However, I believe that the correct algorithm could be thus: If the table X is a many side table to table Y which is a many side to table Z, which is many side table to X, then this is a bad set. OTOH, saying that table X being a many side to table Y and Z being many side to Y and X isn't circular. Correct?

That's all I can think of; I'd love it if anybody can critique the suggestions and input their own, helping along to under how we want to organize data.
 
Aw, geez, you don't ask easy questions,do you? (That was rhetorical...)

one can ask whether we want this child to inherit the 'other' parent table's properties to determine which table we need to convert to a junction table?

In isolation, this statement is not correct. You use a junction table to relate selected members of two tables where taken as a whole the number of members related to any one entry in either table is likely to be many, regardless of which table you started from. (That's another way of saying "Many to many" but puts it in more concrete terms.)

The question is not whether the target of a pointer-link inherets anything. That is a function of the application. See, PK/FK linkage is like Lassie. Yeah, the dog barks at the well. But you still have to explore to find out whether a child fell down the well or Lassie just thinks it's a smell object that needs to be covered up. There is no "automatically inherit properties" until you SAY there are properties to be inherited.

There is also the consideration that you could have collapsed some list of common properties into a single table such that the CHILD's properties are possessed by the parents having links to the children. (I.e. the property tables are the ONE side and it is the PARENTS who are the many side.)

Parent/child is SOMETIMES the wrong analogy for the mental image we try to build. In the case of a list of attributes shared by many records in table A, logically the property table records describe, and therefore are lower in the hierarchy, than table A. Should that be a child table EVEN THOUGH it is on the "wrong side" of a more traditional one/many relationship? I.e. the one and many sides are reversed with respect to common parent/child models.

The data must ALWAYS ALWAYS ALWAYS drive the design. Nicklaus Wirth, the "father" of the Pascal programming language, once stated that 90% of all programming problems stemmed from poor data design. Nowhere does this statement show up MORE often than in database programs.

So start by deciding elementary entities that stand alone. THEN decide how they relate to each other. You don't choose a JUNCTION table to make or break inheritance lines. You choose a junction table because the relationships are of the many/many variety.

Something else you said becomes applicable here, though:

if you need to track how a pair of parent may be related, and may need some extra information that is fully dependent on this pair and never for the third entity, then yes, you need that two-way junction table

Absolutely - because you are expressing a relationship that does not involve that third entity in the three-way free-for-all you were describing. Again, if what you are building is a data model of some real-world process, then no matter HOW many ways they interact, you can express that interaction in a junction table. But you only need a junction table when the records on both sides of the relationship need to interact directly.

Counter-example: Traditional parent/child - most of the time, the child points back to its parent but the parent has no pointer to the children. Not always needed since a query could rejoin them. So do you NEED the complexity of a junction table when a viable one-many relationship can be used? Don't know, depends on what you are doing with them. Is there some data that appears multiple times and involving both of them that necessitates a junction table? I can imagine such a thing: Child services database for visitation rights where parent and several children are listed, parent has only supervised visitation rights, and a junction table (including date and location) tracks the supervised visits between parent and child.

But if you aren't doing something like that, then you don't NEED the complexity of a JUNCTION table because a simple one/many relationship works.
 
Aw, geez, you don't ask easy questions,do you?

Oh, alright. You're right, I've been asking too much hard questions. Here's an easy question. What's 2 + 2? Thanks.

In isolation, this statement is not correct. You use a junction table to relate selected members of two tables where taken as a whole the number of members related to any one entry in either table is likely to be many, regardless of which table you started from. (That's another way of saying "Many to many" but puts it in more concrete terms.)

The question is not whether the target of a pointer-link inherets anything. That is a function of the application. See, PK/FK linkage is like Lassie. Yeah, the dog barks at the well. But you still have to explore to find out whether a child fell down the well or Lassie just thinks it's a smell object that needs to be covered up. There is no "automatically inherit properties" until you SAY there are properties to be inherited.

What about the example above where if you make a query pulling a child key and showing its parents' fields? Each child of same parent will always have same entries in parents' field, no?

There is also the consideration that you could have collapsed some list of common properties into a single table such that the CHILD's properties are possessed by the parents having links to the children. (I.e. the property tables are the ONE side and it is the PARENTS who are the many side.)

Parent/child is SOMETIMES the wrong analogy for the mental image we try to build. In the case of a list of attributes shared by many records in table A, logically the property table records describe, and therefore are lower in the hierarchy, than table A. Should that be a child table EVEN THOUGH it is on the "wrong side" of a more traditional one/many relationship? I.e. the one and many sides are reversed with respect to common parent/child models.

It sounds like we're talking about lookup tables here? I can see what you mean as it'd be odd to say a lookup table that lists a set of valid values is one higher than the table that describe a free-standing entity.

But if you query that entity table and include the lookup table's value, you'd see always the same key/value. Does that follows then that the entity table may and can "inherit" lookup table's value?


The data must ALWAYS ALWAYS ALWAYS drive the design. Nicklaus Wirth, the "father" of the Pascal programming language, once stated that 90% of all programming problems stemmed from poor data design. Nowhere does this statement show up MORE often than in database programs.

Agreed.


Absolutely - because you are expressing a relationship that does not involve that third entity in the three-way free-for-all you were describing. Again, if what you are building is a data model of some real-world process, then no matter HOW many ways they interact, you can express that interaction in a junction table. But you only need a junction table when the records on both sides of the relationship need to interact directly.

Just so we don't miss that end, suppose we want to describe a junction table between a X and Y table. Z only applies if X and Y are together, which is not always the case. Ought the three way junction be dependent up on the XY junction or can we have three way junciton table directly referring each tables' prime key and simply use that XY junction to validate the three way junction through forms/VBA/whatever?

Counter-example: Traditional parent/child - most of the time, the child points back to its parent but the parent has no pointer to the children. Not always needed since a query could rejoin them. So do you NEED the complexity of a junction table when a viable one-many relationship can be used? Don't know, depends on what you are doing with them. Is there some data that appears multiple times and involving both of them that necessitates a junction table? I can imagine such a thing: Child services database for visitation rights where parent and several children are listed, parent has only supervised visitation rights, and a junction table (including date and location) tracks the supervised visits between parent and child.

But if you aren't doing something like that, then you don't NEED the complexity of a JUNCTION table because a simple one/many relationship works.

Let's see if I'm understanding you. You're telling me that if I ever have a case where I need to know what child(ren) a parent record has(have), I'd need junction table to provide the "reverse pointer", even if the relationship is really a one-many, correct?
 
Let's see if I'm understanding you. You're telling me that if I ever have a case where I need to know what child(ren) a parent record has(have), I'd need junction table to provide the "reverse pointer", even if the relationship is really a one-many, correct?

If it is a one to many relationship then you can find the child records using a simple INNER JOIN in a select query.

This example not only finds Child records but also Grandchild records.

Code:
SELECT DISTINCTROW clients.ClientIndex, clients.Title, clients.[First Name], 
clients.Surname, clients.Addr1, clients.Addr2, clients.Addr3, clients.[post code], 
clients.[Phone Number], clients.VisitFee, Visits.VisitId, Visits.ClientIndex, 
Visits.VisitDate, Visits.InvoiceIssued FROM clients INNER JOIN (Visits INNER JOIN [Visit Items] ON 
Visits.VisitId = [Visit Items].VisitId) ON clients.ClientIndex = Visits.ClientIndex
ORDER BY Visits.VisitDate;
 
Where we are having problems is in the semantics hidden within common words.

"Inheritance" has a specific programming meaning such as you alluded to when discussing object-oriented methods. Unfortunately, it has more - and less - meaning than that.

When breaking something down to determine where it goes, these rules - which are just basic normalization techniques - must be followed.

1. Identify the elementary entities. Persons, jobs, houses, cars, individual parts in a parts warehouse... it doesn't matter (at this level) whether they have any relationship to anything at all. Because the first job is to identify the players.

2. Once that is done, identify attributes of each. For persons, you can consider (if any of these are important) eye color, hair color, skin tone, height, weight, number of natural teeth, social security number, employee number.... you get the idea. Now decide whether any of the things you just named can be considered for membership in a table that describes the entity. You ASSUME for this analysis that you will eventually identify a prime key even if it later turns out to be a meaningless key. Then you ask, is there any item in that list that I would not properly put in the same table with that PK because the entity is not solely dependent on the PK?

3. Determine whether any relationships could exist between the entities you have defined. for instance, a person works at a specific job, drives a specific car, lives in a specific house, and maybe buys specific prts for his car when he is on a do-it-yourself kick.

4. Here is a tricky part. Determine whether the relationship between the two entities is such that you need one/one, one/many, many/one, or many/many relationships. Draw those relationships in the Access window set aside for that purpose.

5. Here's the next trick. Decide whether the relationship carries baggage that doesn't uniquely belong to either party of the relationship. Like, the date you move into the house; the date you bought the car (used, so it isn't the first purchase date on the car); the day you got hired into that job....

6. This expands to include three-way, four-way, and n-way relationships. In each case, the idea is to determine whether an item is baggage or belongs in one of the elementary tables. It is possible to imagine a case where a junction entry needs a PK because it is itself a part of another relationship that only occurs in specific ways. So you could have a two-way junction as one of the referents of another junction.

If your question is how to analyze a relationship, that's the basics.
 
Yes, I had read something similar to what you listed.

It's good to review it, as I mentioned earlier, I caught myself thinking about inheritance, and parent/child models, and worried that they might not be correct way to analyze entities.

The biggest problem is that it's easy to get yourself bogged into the world, and look between two things, and think you have their relationship figured only until you actually try to do something does the problems rear their ugly heads.

That said, I wanted to note that when diagramming the database, I'm tempted to try and draw up the big picture because I don't want to get stuck looking at trees for forest. Yet, I've got better result when I do some cursory input in table view which helps me identify the relationship than if I just looked at the diagram.

This then reminded me of other good programming practice: build your program modularly; add only few lines of codes, compile often and run the build to make sure you're still getting the expected output. Would be same of building blocks of database?

Also, I take that the part about identifying circular references is generally correct?
 
Here's where I become a big-time heretic. Some folks are going to think I've gone completely nuts. My dear, long-suffering wife already knows better. I've been nutso for a LONG time.

Here's the big heresy: There is no such thing as a circular reference. There is only such a thing as a program that doesn't know when to stop.

OK, that's a LITTLE extreme. But not ENTIRELY untrue. The catch is, most programs cannot stop unless you take steps. Here, we might need to work on whether we can REALLY have a case where Y depends on X, Z depends on Y, and X depends on Z. Suppose (ad argumentum) that you do. Next question is "Which are one/one, which are one/many or many/one?, and are any in the many/many situation?"

For the one/one case, we had a discussion a while ago as to why one/one is so rare. If there is no practical reason (size of record) or meta-data record (difference in some attributes of the fields that isn't a data difference, such as different security requirements for the parts) then one/one should not normally exist. Using the logic from our prior discussion, eliminate one or more the tables. Granted, it is complex to see how to do that, but if you are REALLY normalized, one/one should only rarely exist.

For the case where ANY of those relationships in the circle is one/many or many/one, a circular relationship could exist - if the physical reality has a feedback loop in it and your model includes the loop. Suppose that somehow you have a database to relate advertising to sales to profits. Profits cause you to trigger a new ad campaign to make more sales to get more profits. If somehow you had a database to track this information, perhaps circularity is not wrong - since the model is tracking a real-world loop.

The catch is that you cannot have normal referential integrity in such a model because it is THAT which causes the problem. The circular reference would be real. It is the RI rules that would not be. Because Access doesn't know where to stop when it is evaluating circular relationships. (Actually, it does. It stops when it thinks it has caught you doing something highly improper. Ahh, it's just a program dumber than a box of garden tools, what does it know about real-world feedback loops?)

The trick is that when you have a real-world loop and try to model it, RI is going to be of limited value. You might have to "roll your own" RI equivalent in such cases because in a loop situation, Access never can tell which is the chicken and which is the egg and which is the road that the chicken just crossed. (No... don't ask me WHY it crossed.)

Many/many cases are the bugaboo since you MUST involve a junction table and therein lies chaos. Such as the self-referential table we discussed before. But therein also lies the solution if the circularity isn't expressed in a normal way. Lets take the case of X --> Y --> Z --> X - but instead of trying to write THIS relationship, enter X in the relationship window a second time and let Z --> X(2). That might mollify your RI analysis. (Might not, but hey, worth a shot.)

Circular references kill programs that try to be general. They become very difficult to analyize. In terms of computability theory, they fall into one of the categories of non-finite automata. (Null transitions aren't the ONLY way to get to non-finite automata.) But if your program "knows" you have a feedback loop, you can limit its actions at certain stages to prevent the "true" loop from coming into fruition. I would add that true circular table references SHOULD be so rare that most of us will never see a real one. And the odds are SO bad that if we do, the first thing we should do is go back to relook at the problem to see if we got it right.

So, the next question is, how do you make the program work with circular data references inside of the data? There, you must ask yourself this question: Why doesn't the process being modeled spin out of control?Considering the earlier discussion, you would only have a circularity in the design if the real world had such a loop. So... what keeps the real world from flying apart? Answer that question and you can make your model work.
 
This then reminded me of other good programming practice: build your program modularly; add only few lines of codes, compile often and run the build to make sure you're still getting the expected output. Would be same of building blocks of database?

Worthy question, worthy enough to get a separate answer. (Besides, the post about circular references was getting a bit longish.)

Proper program design is, indeed, modular. However, here we again need to watch out for common worlds with special meanings - baggage words as I call them. Like "module" as an entity and "modular" as an attribute. So just watch out for the fog factor of using a word with baggage.

OK, what constitutes a modular approach in database design? Answer: The same rules apply to database design as to object oriented design. Our old friend Julius Caesar's rule - divide and conquer.

You start breaking down entities until they appear to be "atomic" (From the Greek atomos - "a" as a prefix meaning "not" and "tomos" meaning "can be cut")

Look for the elementary things, things that can exst on their own in the absence of a relationship. People exist. Cars exist. In the world of abstractions, jobs exist even when not filled. Houses exist, even if not owned. The issue will be to verify that the item stands on its own whether it has physical or only abstracted reality. (Consider bank accounts, e.g. - very real but these days they only exist inside a computer.)

Once you have the atomic elements of your database, you fuse things together with relationships. These relationships are necessary early in the game because Access can USE them to help you. Queries, forms, and reports, for example, can exploit declared relationships, particularly if you use one of the wizards to build your structure.

Speaking of other entity types, you need to identify the way you are going to enter data, validate it, and distribute it. Some of this can be done by query; other parts require VBA interaction. The idea being that you must already have your data laid out BEFORE you start worrying about code. Without a good data layout, the ghost of Nicklaus Wirth will haunt you.

Forms and data import specifications will be your method of massive data input. Reports (and possibly forms) will be your most likely method of massive data output. But if you don't have a data layout, you have nothing. So that is why you get the data sets laid out as early as you can.

Queries - action queries, to be specific - will be part of your business rules dealing with transitions. Data elements are rarely static. When a query updates a table, some business rule was applied. When the query inserts a record, you have had a business rule kick in to keep records tracking something real. When you delete a record (say, for archiving) or mark it as obsolete, some business procedure drove that action.

While your layout process can proceed on paper (or white-board), you cannot write actual code until you have most of the method designed at least to the steps that track the actions of the model. However, the more design you put in before writing code, the better off you are.

Now, what order should all this happen in? IMHO,

1. Analyze the real-world model well enough to commit the concept to paper.

2. Identify all entities you will need.

3. See how they relate to one another.

4. Look for actions to be performed (VBA or queries)

5. At intervals as needed, update your model document to show what you have built. Don't stray from the design if the design is correct. Don't LET the design stray from reality.

6. Flesh out the elementary entity tables as their content becomes clear.

I would say that your database is complete once things get fleshed out, but we both know that a stagnant process isn't going to be modeled. It is going to be scrapped. So keep up with the volatility of the real world.
 
RE: Circular references and RI-

Let's go on a side trip.

I like to be strict (read anal-retentive) about how I want my programs to except variables. If I can strong type, I will. Why, I wish VBA comes with "Option Strict", so I could forbade implicit conversion from one data type to another, just because I like it!

Now, I like my database having RI enforced for *all* relationship. It does mean much more pain when I try to set up my forms, but as long I provide backdoors as we've already discussed in Null thread, it's not that bad.

So, if I had such circular relationship, does it then follow that RI will fail? My gut says no, provided that I do some tricks in VBA so the process is defined before the users' input is appended to the database.
 
5. At intervals as needed, update your model document to show what you have built. Don't stray from the design if the design is correct. Don't LET the design stray from reality.

Yet another quality post! That really helps to flesh out the details I should be thinking about in designing the database.

I want to address this point above, as see, when I'm thinking about the database design, I can't help but think this has to be a be-all, end-all design; there's no room for experiment as if I start out with badly designed database, and learn from mistakes and do better, it'll be much more work to take the data out of wrong structure and append it to the correct structure than if I spent few extra hours in analyzing and re-analyzing the design before deciding on a final design.

Heck, I don't think my boss likes to support me essentially undoing my earlier mistakes. Then, how could a database design could be flexible enough for the real world's inevitable torrents of changes?
 
I can't help but think this has to be a be-all, end-all design

I used to think that way until I learned about reality. And the reality is that the world is dynamic, not static. "We've always done it this way" is the very heartstone of failing businesses. In nature we see it, too. Animals that do not adapt do not survive. one/one relationship on THOSE two halves of the statement.

Agreed, bosses don't like tweaking for the sake of tweaking and you surely want to stabilize everything as much as you can. But the world around you will also change. You must therefore be prepared to advise the boss when a new wrinkle becomes evident, or a new demand has been made.

there's no room for experiment

In a perfect world, there's no NEED for experiment.

However, in reality there are many experiments going on around you every day, if you just look. How often does the boss "shuffle" the office staff to try to coax a little more productivity out of them? How often do you see vendors come out with a new line of product only to see that line vanish some weeks or months later. (Remember "New Coke" ??? Disaster.) Ever hear of Beta releases? Ever hear of Homo Neanderthalis? Experiments that might - or might not - survive in some form will be evident if only you look.

Let's face it. You are quite human. (At least, you don't seem to TYPE like a room full of chimpanzees...) You can make mistakes. If you never made a mistake, your boss couldn't afford to pay you what you are worth. So take that "no room for experiment" statement and hang it on the wall somewhere. Stare at it. Drink it in. Then ask yourself if you really wanted to work for a bunch of unforgiving bosses who really believed that was true.

Having said all of that, I agree that if you could spend another hour in design analysis and it would save ten hours of implementation, you've done a good thing. The problem with that statement is that it's a two-pass algorithm. You can only decide whether an hour spent in extra design actually paid off by actually DOING the design and implementation phases to compare their durations. Then, having done it once, why in the seven frozen rivers of Tartarus would you go back and do it again a different way so you could measure the actual savings? We talk about the theory of design, and yes, time spent in design phases seems to be time well spent. An author named Barry K Boehm (I think that was his name) gathered case studies from dozens (over one hundred) projects to compare factors. He was able to show strong positive correlation between extra design work and reduced implementation work. And I use "strong positive correlation" in the strict scientific sense. You know that I know EXACTLY what I mean when I say it that way.

The real world faces constraints of time and effort and forebearance of your funding sources. There is a law of diminishing returns that says you can reach a point where further analysis will not have significant effect on the project's outcome. So the question becomes, "How hard is the boss pushing for us to leave the design phase and step into the implementation phase?" At some point you reach the balance between your desire to get it right the first time and your boss;s desire to "git 'r done." And reality will set in.

In practical terms, you can never say you are done. You can only say you have reached a level from which you can take the next step. And if you MUST abandon the analysis phase, you simply have to write down the list of unanswered questions and keep good notes. Advise the boss of the things you could not answer and be honest. "Boss, because you said we had to get to implementation, these are the questions we didn't have time to answer." If you get an extension, GREAT! If not, you were honest and kept the boss in the loop. Which is a real-world issue unrelated to the actual problem in hand.
 
If I can strong type, I will.

Which is why I like to program in Ada despite the Byzantine syntax that often crops up, and why I so strongly dislike C variants.

But VBA is what we have, so we must find the discipline within ourselves to "do the right thing" when programming in it.

This "implicit type conversion" isn't a bad thing, it is just like Nulls - another thing waiting to trip you up. Take proper steps to watch for it and you will be good to go. To help, consider whether you want all of your local and global variables in VBA (declared, not implied by being bound through a form) to have proper type prefixes - in essence, a naming convention. That is one of many possible tools at your disposal to handle the problem you describe.
 

Users who are viewing this thread

Back
Top Bottom