Query to find 'orphan' records.

jonno_g

Registered User.
Local time
Tomorrow, 00:04
Joined
May 30, 2007
Messages
52
I have a table that contains records of drawings called 'tblDWGIndexMain'. Each record has drawing number field 'DWG' and project number field 'ProjectNumber' neither of these fields is unique, as a ProjectNumber can affect multiple drawings.

The best way I could think to explain this is in a parent-child relationship between the parent drawing and the child project drawings, where the record with a DWG number but no ProjectNumber is the parent and the record/s with the ProjectNumber/s are the children. As you can see below, there are some 'orphan' records where they have both DWG and ProjectNumber but there is no corresponding 'parent' record (without ProjectNumber).

e.g. the table can (and does) have records in this fashion:

tblDWGIndexMain
DWG | ProjectNumber
10000
10001
10001 P11001
10002
10002 P11001
10003 P11002

The record for 10003-P11002 is an orphan, with no parent record existing that contains 10003 without a ProjectNumber.

I'm not concerned about any records that have a DWG number but no ProjectNumber, but I do want to identify all records that have both a DWG number and a ProjectNumber but that do not have a corresponding parent record with only a DWG number (and no ProjectNumber).

So far I have the following:

Code:
SELECT DISTINCTROW A.*
FROM tblDWGIndexMain As A LEFT JOIN tblDWGIndexMain As B ON A.DWG = B.DWG
WHERE (((A.ProjectNumber) Is Not Null)) ORDER BY A.DWG, A.ProjectNumber;


But it is not giving the results I need, it is still including records that do have parent records.

I can check the number of parent records against the number of child records by using these two queries:

qry_ProjectDrawings:
Code:
SELECT *
FROM tblDWGIndexMain
WHERE ProjectNumber is not null;

(returns 5983 records)

and then:

Code:
SELECT *
FROM tblDWGIndexMain INNER JOIN qry_ProjectDrawings ON (tblDWGIndexMain.DWG = qry_ProjectDrawings.DWG
AND tblDWGIndexMain.ProjectNumber is null) ORDER BY tblDWGIndexMain.DWG;

(returns 3412 records)

That gives all parent records that have corresponding child records, but I need to go the other way. (Also, it's not a direct correlation between the number of child records and the number of parent records because one parent can have many children.)

e.g. this is a possibility:

tblDWGIndexMain
DWG | ProjectNumber
10000
10001
10001 P11001
10002
10002 P11001
10002 P11002
10002 P11003
10003 P11002

Is the first querry above heading in the right direction? Or am I barking up the wrong tree entirely?

Thanks for looking.

Cheers,...Jon.
 
Last edited:
Things would be much easier for you if you would normalize this data. If a drawing can be related to multiple Projects, and a Project can be related to multiple drawings, then you actually need three tables to model this correctly.

Having said that, a query like the following (untested) should return the results you are looking for;

Select * From tblDWGIndexMain Where ProjectNumber Is Not Null And DWG Not In(Select DWG From tblDWGIndexMain As B Where B.ProjectNumber Is Null)
 
Things would be much easier for you if you would normalize this data. If a drawing can be related to multiple Projects, and a Project can be related to multiple drawings, then you actually need three tables to model this correctly.

Thanks very much for the query, that was precisely what I needed in order to progress further.

I had started to look at 'Not In' after I posted this, but hadn't gotten it to work yet as I posted late Friday and just got back to it now. I knew there had to be an easier wayt to solve the problem that the path I had started down.

I do have to respectfully disagree with your comment on normalization though. This database is designed purely to manage the drawings, not the projects themselves. As such, each row in this table represents a single drawing entity that has both drawing number and project number (in some cases, such as when the drawing reaches the 'As-Built' stage, the project number is null). It is sufficiently normalised to meet its required purpose. What you have seen is only a very small portion of the other information that is stored along with drawing number and project number for each drawing record. If I were to separate the data into three different tables then it would vastly increase the complexity of the database.

Whilst it is possible for one As-Built (parent) drawing to be used as the basis for several project drawings over time, and it is also possible several Project Drawings (children) to be generated from the same As-Built (again, over time), there is never a case where more than one record exists with both the same DWG and ProjectNumber data. (i.e. no duplicate records).

I think that I did not explain it all that well with my first post, but I was trying to condense the core of a very detailed and complex drawing management philosophy that has been developed over several years into just a couple of paragraphs.

Cheers,...Jon.
 
I do have to respectfully disagree with your comment on normalization though.

Well you certainly know your data much better than I do, but the fact remains that you seem to be ending up with orphan records that you have to go back and correct. If this data is being imported from an external source then maybe you don't have any control over that. However, if the data entry is taking place within your application, then it is something that you should be able to control through proper table and/or form design.

JMO
 
...but the fact remains that you seem to be ending up with orphan records that you have to go back and correct.

You've assumed that I'm going back and correcting them, (which I guess is a reasonable assumption, considering that you don't have the background knowledge of how our system functions), and you've also assumed that the situation I have described is a problem. It's not, it is by design. At this stage what I'm doing is simply identifying the 'orphans' in order to show what draughting work remains to be completed on a given project.

If a record is an 'orphan' it is because the 'parent' drawing does not actually exist yet. Most likely this is because the drawing itself has been created in association with a project and may still be in the process of being constructed. After construction a new, non-project version of that drawing will be created (that has DWG number but no Project Number) as an As-Built drawing - this becomes the 'parent' record. At this point the project drawing goes into stasis or dormancy, never to be revived. It is a snapshot of exactly what was installed as a part of that project. The aim of this particular exercise is to identify which drawings from a given project have not yet been made into As-Built drawings.

So yes, I am 'correcting' the orphaned records, but as a very deliberate step in the process. The fact that these records are orphaned in the first place is deliberate and controlled, it was simply the best analogy I could think of at the time to describe the relationship between the records.

This database serves as a record of 'exactly what does exist', not 'what we think should exist'. Also, like I said earlier, it is part of a complex drawing management system that took years to develop.

I think that perhaps my use of the terms 'Parent' and 'Child' are what has caused your misunderstanding, so I've tried to clear it up a little (but I might have just caused more confusion in doing so). Despite my use of the terms Parent and Child when referring to records, it is in fact more common for the 'child' drawing (and, therefore, the database record of it) to be created before the 'parent' record. This is perfectly acceptable within our management system.

If this data is being imported from an external source then maybe you don't have any control over that. However, if the data entry is taking place within your application, then it is something that you should be able to control through proper table and/or form design.

JMO

The data is being created within the application, but there is nothing wrong with the table or form design (well, not in the way you're suggesting anyway).

I would agree that the opinions you are expressing would likely hold true in most other situations, however this database has been built with a very specific purpose in mind. Unless you have prior experience in document management, and maybe even within the specific sub-field of drawing management and control, then the concepts can be hard to wrap your head around without a detailed explanation (and usually some diagrams and flow charts). I have staff here who have been working with the system for several years who still get it wrong occasionally.

It is unfortunate that the system needed to be so complex, but believe me - we spent a lot of time trying to make it as simple as it possibly could be whilst still serving its purpose effectively.

Cheers,...Jon.
 

Users who are viewing this thread

Back
Top Bottom