AOB
Registered User.
- Local time
- Today, 03:08
- Joined
- Sep 26, 2012
- Messages
- 637
Hi guys,
I have hastily put together an admittedly awful query - or rather a series of awful queries - in order to get around a duplication issue in a source file. I have no control over the source file so have to live with the duplication that exists there. And I would prefer to build a query to handle the duplication rather than introduce a requirement for the source to be refined prior to running the process.
Here is the background...
The source (a linked spreadsheet) has 3 key fields : Book, Centre & Source. All text fields. And something of the order of 30,000 rows / records
Each Book *should* be assigned to just one Centre (each Centre can be associated with multiple Books)
Books can appear in multiple records but, generally speaking, each instance of any Book *should* show the same Centre each time.
However, on inspection, it appears there are a handful of Books which have multiple Centres. In order to identify which Centre is correct, I have to refer to the Source field.
The dataset produced by my query should therefore show each distinct Book, with the relevant Centre, according to the following criteria :
qryDistinctBooks (this gives me the complete list of distinct, non-null, Books)
qryDistinctBookCentreCombos (this gives me the complete list of distinct combinations of Book / Centre)
qryBooksWithMultipleCentres (this gives me the list of Books which are assigned to more than one Centre)
qryBooksWithMultipleCentresSourceABCD (this gives me, for each of the Books assigned to more than one Centre, the Centre corresponding to a Source of "ABCD", if one exists)
qryFinalBookToCentre (the query that uses all of the above to generate the final dataset)
Horrendous, isn't it? I know, I know... I ran this yesterday and it took more than 12 hours to complete (I left it overnight) Not sure if it's as a result of the use of the LIKE operator in qryBooksWithMultipleCentresSourceABCD or something else. But it just generally looks pants.
Can anybody suggest how this can be improved? I need to get the run time down to minutes / seconds rather than hours / days!
Thanks in advance & apologies for the size of the post
Please keep the SQL-slagging to a minimum; I am more than aware just how awful the above is, I am not suggesting for one second that this is good query design!!
AOB
I have hastily put together an admittedly awful query - or rather a series of awful queries - in order to get around a duplication issue in a source file. I have no control over the source file so have to live with the duplication that exists there. And I would prefer to build a query to handle the duplication rather than introduce a requirement for the source to be refined prior to running the process.
Here is the background...
The source (a linked spreadsheet) has 3 key fields : Book, Centre & Source. All text fields. And something of the order of 30,000 rows / records
Each Book *should* be assigned to just one Centre (each Centre can be associated with multiple Books)
Books can appear in multiple records but, generally speaking, each instance of any Book *should* show the same Centre each time.
However, on inspection, it appears there are a handful of Books which have multiple Centres. In order to identify which Centre is correct, I have to refer to the Source field.
The dataset produced by my query should therefore show each distinct Book, with the relevant Centre, according to the following criteria :
- If the Book / Centre relationship is consistent (only one Centre assigned to a particular Book), then include that Book / Centre combination
- If the Book / Centre relationship is not consistent (more than one Centre assigned to a particular Book), then check the Source for each combination
- If the Source starts with "ABCD" then include that Book / Centre combination
- If none of the Sources starts with "ABCD" then omit that Book completely from the final dataset
qryDistinctBooks (this gives me the complete list of distinct, non-null, Books)
Code:
SELECT DISTINCT lnkBooks.[Book]
FROM lnkBooks
WHERE Nz(lnkBooks.[Book],"")<>""
qryDistinctBookCentreCombos (this gives me the complete list of distinct combinations of Book / Centre)
Code:
SELECT DISTINCT lnkBooks.[Book], lnkBooks.[Centre]
FROM lnkBooks
WHERE Nz(lnkBooks.[Book],"")<>""
AND Nz(lnkBooks.[Centre],"")<>""
qryBooksWithMultipleCentres (this gives me the list of Books which are assigned to more than one Centre)
Code:
SELECT [qryDistinctBookCentreCombos].[Book], Count(*) AS NoOfDistinctCentres
FROM qryDistinctBookCentreCombos
GROUP BY [qryDistinctBookCentreCombos].[Book]
HAVING Count(*)>1
qryBooksWithMultipleCentresSourceABCD (this gives me, for each of the Books assigned to more than one Centre, the Centre corresponding to a Source of "ABCD", if one exists)
Code:
SELECT qryBooksWithMultipleCentres.[Book], lnkBooks.[Centre], lnkBooks.[Source]
FROM qryBooksWithMultipleCentres
LEFT JOIN lnkBooks
ON qryBooksWithMultipleCentres.[Book] = lnkBooks.[Book]
GROUP BY qryBooksWithMultipleCentres.[Book], lnkBooks.[Centre], lnkBooks.[Source]
HAVING lnkBooks.[Source] LIKE "ABCD*"
qryFinalBookToCentre (the query that uses all of the above to generate the final dataset)
Code:
SELECT qryDistinctBooks.[Book], qryBooksWithMultipleCentresSourceABCD.[Centre]
FROM qryDistinctBooks
INNER JOIN qryBooksWithMultipleCentresSourceABCD
ON qryDistinctBooks.[Book] = qryBooksWithMultipleCentresSourceABCD.[Book]
UNION
SELECT qryDistinctBooks.[Book], qryDistinctBookCentreCombos.[Centre]
FROM qryDistinctBooks
INNER JOIN qryDistinctBookCentreCombos
ON qryDistinctBooks.[Book] = qryDistinctBookCentreCombos.[Book]
WHERE qryDistinctBookCentreCombos.[Book] NOT IN
(SELECT qryBooksWithMultipleCentres.[Book]
FROM qryBooksWithMultipleCentres)
Horrendous, isn't it? I know, I know... I ran this yesterday and it took more than 12 hours to complete (I left it overnight) Not sure if it's as a result of the use of the LIKE operator in qryBooksWithMultipleCentresSourceABCD or something else. But it just generally looks pants.
Can anybody suggest how this can be improved? I need to get the run time down to minutes / seconds rather than hours / days!
Thanks in advance & apologies for the size of the post

AOB