Using a Condition from one table in a UNION SELECT query

Acc

Registered User.
Local time
Yesterday, 22:21
Joined
Jun 3, 2014
Messages
13
Hi, I am having a little trouble with the following problem and was wondering if anyone could help.

I am building a query from several input tables Digital Sales and Daily Sales for a number of retailers, whereby I only wish to incorporate Daily Sales after the date of the latest Monthly Sales report for that retailer (which are all included in Digital Sales).

I currently use several queries to record the date of the latest Digital Sales report for each retailer of the form

Code:
SELECT Max([Digital Sales].[Sales Mth]) AS [MaxOfSales Mth]
FROM [Digital Sales]
WHERE ((([Digital Sales].Source)='RetailerX'));

This date is given as the 1st of the month for which the report has been supplied.

I then use the following query (with the union select repeated for each retailer)

Code:
SELECT [Digital Sales].[Sales Mth] AS [Date], [Digital Sales].Source, [Digital Sales].Account, [Digital Sales].ISBN, Sum(IIf([Digital Sales].[Revenue(£)]<>0,[Digital Sales].[Unit Sales],0)) AS Units, Sum([Digital Sales].[Revenue(£)]) AS Revenue, 'Monthly' AS Which_Database
FROM [Digital Sales]
GROUP BY [Digital Sales].[Sales Mth], [Digital Sales].Source, [Digital Sales].Account, [Digital Sales].ISBN

UNION SELECT [RetailerX Daily sales data].Date AS [Date], 'RetailerX' AS Source, 'RetailerX UK' AS Account, [RetailerX UK Daily sales data].Identifier AS ISBN, Sum(IIf([RetailerX UK Daily sales data].[Shipped COGS]<>0,[RetailerX UK Daily sales data].[Shipped Units],0)) AS Units, Sum([RetailerX UK Daily sales data].[Shipped COGS]) AS Revenue, 'Daily' AS Which_Database
FROM [RetailerX UK Daily sales data], [Latest RetailerX Report]
GROUP BY [RetailerX UK Daily sales data].Date, [RetailerX UK Daily sales data].Identifier
HAVING ((([RetailerX UK Daily sales data].Date)>=First([Latest RetailerX Report].[MaxOfSales Mth])+1+Day(DateSerial(Year(First([Latest RetailerX Report].[MaxOfSales Mth])),Month(First([Latest RetailerX Report].[MaxOfSales Mth]))+1,0))))

I have a feeling I am making more of a meal of this than is necessary and would like to streamline this down into a single query if possible.

Any advice would be much appreciated.
 
Just a basic read of your description, suggests that you only need a GROUP BY query. GroupBy clause on the Source, and Max on theSaleMonth? Or am I missing something? Could you maybe clarify with some example data and resultset you wish to see?
 
That's not quite enough I don't think, though it might be and I'm not understanding (fairly new to this).

Here's an example of what the data looks like (there are more fields in all cases, but they don't matter for this question)

Digital Sales
Code:
Sales Mth |Source   |Account     |Units
01/02/2014|RetailerX|RetailerX UK|   12
01/02/2014|RetailerX|RetailerX UK|    5
01/02/2014|RetailerX|RetailerX UK|   19
01/03/2014|RetailerY|RetailerY UK|   12
01/03/2014|RetailerY|RetailerY UK|   15
01/01/2014|RetailerY|RetailerY US|   11

Daily Sales RetailerX UK
Code:
Date      |Source   |Account     |Units
03/02/2014|RetailerX|RetailerX UK|    8
04/04/2014|RetailerX|RetailerX UK|    1
01/02/2014|RetailerX|RetailerX UK|   55

Daily Sales RetailerY UK
Code:
30/04/2014|RetailerY|RetailerY UK|   12
21/03/2014|RetailerY|RetailerY UK|   15

Daily Sales RetailerY US
Code:
15/02/2014|RetailerY|RetailerY US|   11

Output
Code:
Date      |Source   |Account     |Units
01/02/2014|RetailerX|RetailerX UK|   12
01/02/2014|RetailerX|RetailerX UK|    5
01/02/2014|RetailerX|RetailerX UK|   19
04/04/2014|RetailerX|RetailerX UK|    1
01/03/2014|RetailerY|RetailerY UK|   12
01/03/2014|RetailerY|RetailerY UK|   15
30/04/2014|RetailerY|RetailerY UK|   12
01/01/2014|RetailerY|RetailerY US|   11
15/02/2014|RetailerY|RetailerY US|   11

So for each retailer I want to pull in the data from Digital sales and then data from Daily Sales subsequent to the latest date in Digital Sales for a report for that retailer.

There are many Daily Sales tables, one per retailer.
 
Just to clarify what you want to do

You have two tables - digital sales and daily sales and you want to pull out the records from both tables where for daily sales the date is greater than or equal to the maximum date for that retailer and account in digital sales and for digital sales the date equals the maximum date in digital sales.

Is this a correct interpretation of what you want to achieve?
 
Sorry I must not be explaining myself very well. I have about 8 tables. One is called Digital Sales and has information on all retailers, the others are each called RetailerX Daily Sales. I need to pull out the records from all tables where the date for each Daily Sales table is greater than the highest date for that retailer in the digital sales table.
 
Sounds like you have a bad table design - not sure why you need to do it with multiple tables but you will have to use a union query if you keep it that way. Why not just have one table with the retailer identity as a field to differentiate between retailers?

You need to clarify more - your example data shows as a result

01/03/2014|RetailerY|RetailerY UK| 12
01/03/2014|RetailerY|RetailerY UK| 15

but your query is grouping
 
Sounds like you have a bad table design - not sure why you need to do it with multiple tables but you will have to use a union query if you keep it that way. Why not just have one table with the retailer identity as a field to differentiate between retailers?

You need to clarify more - your example data shows as a result

01/03/2014|RetailerY|RetailerY UK| 12
01/03/2014|RetailerY|RetailerY UK| 15

but your query is grouping

The table design is inherited, although I am trying to clean it up.

The reason they end up on separate lines is that they are also grouping by the Identifier field, I haven't typed that out in the example as it isn't something that will affect the question, should have realised it would make that look confusing though.

I am having a look at changing the table design so I will only be using two tables in this query and the use GROUP BY and MAX as suggested.
 
OK I now have 2 tables as follows

Monthly Sales
Code:
Month     |Account  |ISBN      |Sales
01/01/2014|RetailerX|12345ABCDE|5
01/01/2014|RetailerX|12345ABCDF|2
01/02/2014|RetailerY|12345ABCDF|6
01/02/2014|RetailerY|12345ABCDG|9

Daily Sales
Code:
Month     |Account  |ISBN      |Sales
01/01/2014|RetailerX|12345ABCDE|1
03/01/2014|RetailerX|12345ABCDE|2
08/01/2014|RetailerX|12345ABCDE|2
12/01/2014|RetailerX|12345ABCDF|2
01/02/2014|RetailerY|12345ABCDF|6
02/02/2014|RetailerY|12345ABCDG|1
05/02/2014|RetailerY|12345ABCDG|6
07/02/2014|RetailerX|12345ABCDE|2
12/02/2014|RetailerY|12345ABCDG|2
01/03/2014|RetailerX|12345ABCDE|3
02/03/2014|RetailerY|12345ABCDF|5

Desired Output
Code:
Month     |Account  |ISBN      |Sales
01/01/2014|RetailerX|12345ABCDE|5
01/01/2014|RetailerX|12345ABCDF|2
01/02/2014|RetailerY|12345ABCDF|6
01/02/2014|RetailerY|12345ABCDG|9
01/02/2014|RetailerX|12345ABCDE|2
01/03/2014|RetailerX|12345ABCDE|3
02/03/2014|RetailerY|12345ABCDF|5

So what I am trying to do is have the records for daily sales included only if they have a date later than the month of the most recent report in Monthly Sales. So if I have a Monthly Sales report for a retailer in February with a month date of 01/02/2014 I want no sales to show from the Daily Sales table for that retailer from 28/02/2014 or earlier. I do want sales from other retailers prior to that date if they, for example, have a Monthly report only from January.

Does that make more sense?
 
Last edited:
OK I've changed things around to try to make this easier, I now have two tables:

Combined Daily Sales with the fields:

Code:
Sales Month
Date of Sale
Vendor
Account
ISBN
Free/Paid
Units
Revenue
Which_Database

and Combined Monthly Sales with the fields:

Code:
Sales Month
Vendor
Account
ISBN
Unit Sales
Revenue
Which_Database

I am now looking to do one of a number of things either:

a) Sum the [Combined Monthly Sales].Unit Sales grouped by Vendor, Account and ISBN, then divide by the Count of entries for [Combined Daily Sales].Units grouped by Vendor, Account and ISBN and enter this result in a new field. do the same for Revenue.

b) Duplicate the entry for [Combined monthly Sales].Units next to each entry for daily sales grouped by Sales Month, Vendor Account and ISBN

c) Blank the Sales Month column in the new query for Daily Sales results prior to the most recent [Monthly Sales Data].Sales Mth with the same Account.

Any help (or alternate suggestions) much appreciated. My main goal is to allow access to the daily (more inaccurate, but higher granularity) information and the monthly (more accurate, but lower granularity) data without duplicating information whilst looking at Monthly trends, but still being able to use Daily data to represent Months before their Monthly reports are finalised.
 
OK raises more questions

How is the monthly sales table populated - ie. do you run a monthy routine to summarise the daily sales by month, vendor, account and ISBN? Or is it run more frequently/less frequently.

Why do you have a monthly sales table anyway when you can easily calculate this as and when required?

Is the daily sales just for the current month? perhaps you run a routine like the above and then clear down? Or does it go back to the same start as monthly sales? - would be a lot easier if this was the case.

How far back do you go, or want to go, i.e. is there an earliest date for your query?

Re your latest post, you are now referring to grouping by ISBN - please revise your data examples to include this. And the result to also include your requirements per b and c
 
OK. The monthly sales are Actuals reported by retailers and are recorded in a separate database. The Daily Sales are reports received daily from retailers showing that days sales. The two do not always (if ever) match, but they are close.

The daily sales and monthly sales start at different points in time for each retailer. Monthly goes back as far as 2008 for some retailers, Daily go back as far as 2011 for some retailers. There is no one date as each retailer is different.

I want to go back as far as possible, to investigate trends and behaviors in the past.

Sorry about ISBN, it was previously labeled Identifier as that is what one of our retailers users, I'll change the post to say ISBN.

A, B and C are alternate options for going about this. Its the reason I didn't give my tables and outputs initially is that I am so unsure of what the best approach is that I don't know exactly what they will be. My only goal is to not end up with double values by adding Monthly and Daily sales figures together, whilst still having both sets of data available.

I am not great at communicating by text, sorry.
 
May be trying to do too much with one query but we'll give it a go.

This query just works off the daily sales (so goes back to 2011) and also includes for each row the sum of the daily sales for that month plus the related monthly sales.

Code:
SELECT DailySales.SalesMonth, DailySales.SaleDate, DailySales.Vendor, DailySales.Account, DailySales.ISBN, DailySales.Units, DailySales.Revenue, (SELECT sum(Revenue) FROM DailySales as tmp WHERE Salesmonth=DailySales.SalesMonth and Vendor=Dailysales.Vendor and Account=DailySales.Account and ISBN=DailySales.ISBN) AS DailySum, (SELECT first(Revenue) FROM MonthlySales as tmp WHERE Salesmonth=DailySales.SalesMonth and Vendor=Dailysales.Vendor and Account=DailySales.Account and ISBN=DailySales.ISBN) AS Monthly
FROM DailySales

What is missing is monthly sales where there are no daily sales. i.e. sales between 2008 and 2011 but this can be easily resolved with a union query so add this to the above

Code:
UNION SELECT MonthlySales.SalesMonth, Null AS Expr1, MonthlySales.Vendor, MonthlySales.Account, MonthlySales.ISBN, 0 AS Expr2, 0 AS Expr3, 0 AS Expr4, MonthlySales.Revenue AS Monthly
FROM MonthlySales LEFT JOIN DailySales ON (MonthlySales.ISBN = DailySales.ISBN) AND (MonthlySales.Account = DailySales.Account) AND (MonthlySales.Vendor = DailySales.Vendor) AND (MonthlySales.SalesMonth = DailySales.SalesMonth)
WHERE (((DailySales.SalesMonth) Is Null))

To see if this is on the right track, have a look and see what you think
 
Thanks,

I will try this out right away!
 
The query runs, but is too slow to be useable. Would it work better if I created a field concatenating SalesMonth, ISBN, Vendor and Account for each table and used that as the Where criteria instead?
 
It might do, but I would check first that all those fields in both tables are indexed.

Leaving performance aside, is it giving you the results you want?
 
Aside from performance it appears to be, yes. I can only see the first few rows, simply because scrolling through the data or exporting it takes a long time, but they look to be right.
 
OK - so indexing? Also, roughly how may rows of data in both tables are we talking about?
 
The Digital Sales table is indexed by Sales Month, ISBN and Account as a single Index, the Daily Sales is a query pulling together data from several other tables, so not sure if I can index that.

A little over 1 million rows, so .... too many?
 
if the daily sales is a union query then the indexing is lost (you can't have a single index across multiple tables) so with a million rows (bnot too many, but clearly a lot of data (!) this will have a significant impact.

Suggest use your union query to make a temporary table and index that or probably significantly quicker to have a routine to append each table to the temporary table. Long term, your table design is inefficient and I would look at modifying your routines that populates these tables to append to a single table.
 
I will redesign the Daily Sales input method to make it into a single table and index that as well. Then use the two indexed tables to build the Monthly and Daily sales together as we have previously discussed?

Sound like a better solution?

Once I have the two tables together I will be adding on metadata, such as titles and authors for books to then pull into pivot tables.
 

Users who are viewing this thread

Back
Top Bottom