GROUP BY HAVING Count statement

fboehlandt

Registered User.
Local time
Today, 12:19
Joined
Sep 5, 2008
Messages
90
Hi everyone,
I have the following table named 'information' that contains the names and codes of investment funds in two fields:

<Name> <Code>
Name1 Code1
Name2 Code2
...
NameN CodeN

A separate normalized table 'performance' contains the performance of said funds:

<Name> <Code> <MM_DD_YYYY> <Return>
Name1 Code1 Date1 Return1
Name1 Code1 Date2 Return2
...
Name1 Code1 DateM ReturnM
Name2 Code2 Date1 Return1
Name2 Code2 Date2 Return2
...
Name2 Code2 DateM ReturnM
mutatis mutandi for all other funds until Fund N

Although all funds contain the same number of return observations (i.e. the same number of distinct dates), some of those observations contain empty values. I would like to run a query that includes funds with complete performance history only. This is what I have so far:

Code:
SELECT Information.Code
FROM Information INNER JOIN Performance
ON Information.Code = Performance.Code
WHERE Performance.MM_DD_YYYY>=#1/1/1998# And Performance.MM_DD_YYYY<=#1/1/2008#
GROUP BY Information.Code
HAVING Count(*)=121));

The dates are arbitrarily chosen. Thus for #12/1/1997# to #1/1/1998#: HAVING Count(*) = 122 etc...The actual problem is that the number of observations per fund are = 121 regardless of whether a fund reported performance in the given month or not (i.i. empty field values count). Thus, I'm not sure whether Count(*) is the appropriate arithmetic function here but don't know what else to use. Can anyone help please?
 
Is is safe to assume that when you say "Although all funds contain the same number of return observations (i.e. the same number of distinct dates), some of those observations contain empty values.", that in some of the records the date might be missing?

If that is the case then your problem may be related to the NULL dates, and you may not need to use a HAVING statement. Try changing the WHERE statement to use BETWEEN, and add a test for NULL. Something like the following might be a good start:
Code:
WHERE ((Performance.MM_DD_YYYY Is Not Null) And (Performance.MM_DD_YYYY Between #1/1/1998# And #1/1/2008#))
If that is not the case, another approach may be required.
 
@MSAccessRookie:
No, all dates are included (thus in my example 121 dates for all funds). It is a potenially justified question whether dates with no corresponding performance value should be included in a nomalized table. Ideally, these observations should be dropped and I'm considering doing so before running the second query.

@Brianwarnock:
Yes, that does help. I previously had problems checking for empty values but in this case it seems to work. Thanks :)
 
I can run the following query and call it 'dummy':

Code:
SELECT Information.Code
FROM Information INNER JOIN Performance
ON Information.Code = Performance.Code
WHERE Performance.MM_DD_YYYY>=#1/1/1998# And Performance.MM_DD_YYYY<=#1/1/2008#
GROUP BY Information.Code
HAVING Count(Performance.Return)=121;

Next, I run an additional query like so:

Code:
SELECT dummy.Code, Performance.MM_DD_YYYY, Performance.Return
FROM dummy INNER JOIN Performance ON dummy.Code = Performance.Code

Now I get all the funds and there respective time series (i.e. Dates and Returns). Ideally, however, I would like to run this query in one go. Is that possible?
 
p.s. the following was my effort so far. This does not work because table name 'Information' appears twice in FROM clause:

Code:
SELECT Information.Code, Performance.MM_DD_YYYY, Performance.Return
FROM Information INNER JOIN (Performance INNER JOIN Information
ON Performance.Code = Information.Code)
ON Information.Code = Performance.Code
WHERE Performance.MM_DD_YYYY>=#1/1/1998# And Performance.MM_DD_YYYY<=#1/1/2008#
GROUP BY Information.Code
HAVING Count(Performance.Return)=121;
 
The short answer is no, this is a classic case of needing 2 queries, 1 an aggregate query to obtain the relevent information to select the detail in the second. This is actually quite common . You of course only run the second query and the system automatically runs the first so for you it is no hardship.

Brian
 
You posted again before my reply.

It might be possible to use subqueries but to no gai, and i certainly wouldn't bother going there.

Brian
 
@Brian
I thought so. Generally I am in agreement that there would be no gain to force it all into one query. However, there is a reason why I need to run one query before the other (or potentially jointly):
I have a VBA script importing information from an Access database to an Excel pivotable via an OLE DB query. The query below is the command text of said data import. The dates of the query come from a userform specifying what time frame to include (startdate/enddate). If I change the command text to the second query (the one refering to the dummy query) I am not able to include the WHERE statement. In essence, the WHERE statement would have to move to the second query but then the first query's HAVING count clause would yield nothing. I hope this was not too confusing. Essentially, the dates need to be part of the first query (the one in the command text). Then, the dummy query would have to refer to the dates of the first query (actual) query. Note here that otherwise there must be a dummy query for every possible time span. I think changing the original tables using a macro may be the only option here. Please let me know if you have any thoughts on this. Thanks for your help so far :)
 
I have fears that I am about to enter waters that I have not swam in before, however I cannot see the problem with the queries.

The dummy query would become

Code:
SELECT Information.Code
FROM Information INNER JOIN Performance
ON Information.Code = Performance.Code
WHERE Performance.MM_DD_YYYY between forms!yourform!startdate And forms!yourform!enddate
GROUP BY Information.Code
HAVING Count(Performance.Return)=121;
When you run as your VBA script

Code:
SELECT dummy.Code, Performance.MM_DD_YYYY, Performance.Return
FROM dummy INNER JOIN Performance ON dummy.Code = Performance.Code

This will cause the other query to execute and you will get what you require.
Atleast that is what I believe but cannot test this.

Brian
 
Yes, I believe you are correct. However, the dates originate from a Userform in Excel (not an Access form), are then copied into the data import string (command statement) of the OLE DB query and must hence be the dates used to retrieve the data from Access. The second query

Code:
SELECT dummy.Code, Performance.MM_DD_YYYY, Performance.Return
FROM dummy INNER JOIN Performance ON dummy.Code = Performance.Code

will prompt the dummy query to run but the startdate/enddate would have to be specified every time depending on the userfrom entry dates. If only there was a way to 'wire' the dates to the preceding dummy query...

I now also have a table that contains non-empty entries only (i.e. there are no entries for dates that have no return associated with them). This table could be directly used in the query but I am not sure as to how to prompt it to only yield series with the full complement of observations. Perhaps this is a dead-end as well?
 
Before giving in I would be inclined to ask on the forum, not sure where, if ACCESS can access an Excel user form, unless you know that it cant.

As I said I am probably out of my depth, but is it possible to open a form containing those dates prior to running the query

Brian
 
Yip, that is worth a try. Although the backend is Access so I'm not too optimistic. I will keep you posted if anything comes up. Thanks for your help so far!
 
Assuming the following:
1. I have a table with no date observations without performance observation (which I do). Let that table be called Performance2, and
2. the performance series of all funds start and end at different times but have no reporting gaps in between (which I think is the case),
then the following appears to do the trick without consecutive queries:

Code:
SELECT Performance2.*
FROM Performance2
WHERE Performance2.Code IN
(SELECT Performance2.Code
FROM Performance2
WHERE Performance2.Code IN
(SELECT Performance2.Code
FROM Performance2
WHERE Performance2.MM_DD_YYYY=#[COLOR=blue][B]startdate[/B][/COLOR]#) 
AND Performance2.MM_DD_YYYY=#[COLOR=blue][B]enddate[/B][/COLOR]#)
AND Performance2.MM_DD_YYYY BETWEEN #[COLOR=blue][B]startdate[/B][/COLOR]# AND #[B][COLOR=blue]enddate[/COLOR][/B]#;

This can easily be combined with an inner query as well. Observe that the GROUP BY statement can be bypassed and all date references are top-level now. The above is certainly not elegant and not particularly effective since it requires a macro to be run creating a new table Performance2 first (although Performance was the result of a macro itself, so nochange there). On the other hand, it seems to work for now. Hopefully I will come across a better solution soon...
 

Users who are viewing this thread

Back
Top Bottom