Top Value Query by Group (1 Viewer)

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
Hello,

So I think I need to do a Top Value Query by Group but cannot figure it out.

I have a query that is called qry_Stop with two columns (AccountNumber:Right([LOCATION],8) and R_Date). Neither are sorted or totaled. I used the Right function because I needed to only get 8 of the numbers from the right.

I would like to find the 5 most recent entries per AccountNumber using the R_Date.

Does anyone have any tips or ideas on how I should move forward?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
you will need to use a subquery

Code:
SELECT *
FROM qry_Stop
WHERE 
    Right([LOCATION],8) & R_Date
    =(SELECT TOP 5 Right([LOCATION],8) & R_Date 
        FROM qry_Stop T 
        WHERE Right([LOCATION],8)=right(qry_Stop.[LOCATION],8) 
        ORDER BY R_Date Desc)

note the aliasing of myTable in the subquery
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
you will need to use a subquery

Code:
SELECT *
FROM qry_Stop
WHERE 
    Right([LOCATION],8) & R_Date
    =(SELECT TOP 5 Right([LOCATION],8) & R_Date 
        FROM qry_Stop T 
        WHERE Right([LOCATION],8)=right(qry_Stop.[LOCATION],8) 
        ORDER BY R_Date Desc)

note the aliasing of myTable in the subquery

CJ_London,

I am not too familiar with subqueries. Would I just replace the SQL in my qry_Stop?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
no - it's another query that uses qry_stop as its source - I corrected myTable after I posted it.

So all you should need to do is copy and paste to a new query and run it

It may well be that it could be incorporated into qry_stop, but without knowing its sql, not possible to say one way or the other
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
no - it's another query that uses qry_stop as its source - I corrected myTable after I posted it.

So all you should need to do is copy and paste to a new query and run it

It may well be that it could be incorporated into qry_stop, but without knowing its sql, not possible to say one way or the other

hmmm I think I'm missing something. After running the subquery I am asked to enter parameters for Location in the subqry and in the qry.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
my mistake - change Right([LOCATION],8) to AccountNumber
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
my mistake - change Right([LOCATION],8) to AccountNumber

Okay, I am getting "At most one record can be returned by this subquery"

Code:
SELECT *
FROM qry_Stop
WHERE AccountNumber & R_DATE
    =(SELECT TOP 5 AccountNumber & R_DATE 
        FROM qry_Stop  T 
        WHERE AccountNumber=(qry_Stop.AccountNumber) 
        ORDER BY R_DATE Desc);

I am looking at Allen Browne's examples of subqueries. Do I need the second WHERE?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
that's the trouble with not having the data in front of you

change the = (SELECT....

to IN (SELECT...
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
Do I need the second WHERE?
yes because you only want the top 5 for that particular accountnumber from the main table.
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
So I tried

Code:
SELECT *
FROM qry_DPStop
WHERE AccountNumber & ROUTE_DATE IN (SELECT TOP 5 AccountNumber & ROUTE_DATE 
        FROM qry_DPStop  T 
        WHERE AccountNumber=(qry_DPStop.AccountNumber) 
        ORDER BY ROUTE_DATE Desc);

I also tried
Code:
SELECT*
FROM qry_Stop
WHERE [R_DATE] In
(SELECT TOP 5 [R_DATE]
FROM qry_Stop I2
WHERE qry_Stop.[AccountNumber] = I2.[AccountNumber]
ORDER BY I2.[R_DATE] DESC)
ORDER BY AccountNumber, [R_DATE] DESC;

Every time I run both queries it says "Running query" and nothing ever happens. I have let it load for about 15-20 minutes and still nothing.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
using subqueries can be slow for large datasets - particularly in your case because you have a calculated value which does not have indexing. How big is your dataset?

Your second query won't work unless route dates are unique - i.e. no two accountnumbers can have the same route_date.

Suggest to prove it is working include a criteria to limit the account numbers to one or two i.e.

Code:
SELECT *
FROM qry_DPStop
WHERE AccountNumber & ROUTE_DATE IN (SELECT TOP 5 AccountNumber & ROUTE_DATE 
        FROM qry_DPStop  T 
        WHERE AccountNumber=qry_DPStop.AccountNumber 
        ORDER BY ROUTE_DATE Desc)
WHERE AccountNumber="xyz";

replace xyz with a valid accountnumber.

If it is working then we can look at ways to improve performance - which probably means moving the subquery into qry_DPstop (assuming that is possible) and ensuring you are working with indexed fields. I don't have a lot of confidence in that since AccountNumber has been derived from the Location field.

Before doing that, you can also make sure that the Route_Date field is indexed - will help the ORDER BY in the subquery.

Another thing to try is to split the Location field into its constituent parts in a table and ensure the new accountnumber field is indexed.

Also, if the route_date is in a table that has a uniquePK i.e.

PK..routeDate..AccountNmber
1...1/1/19.......A
2...1/1/19.......B
3...2/1/19.......A
etc

then we can use that instead.

Not saying this is the case here, but many times I have seen developers trying to create a 'master query' containing every record to then subsequently filter it because in reality they don't want 10,000 rows, they just want one or two or just the last months data, whatever. That is an Excel approach which does not sit well with databases. One client, their query took 18 hours to run, they would start it at 3pm ready to get the results 9am the following morning. By the time I tuned it to just produce what they wanted, it took 10 minutes.
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
using subqueries can be slow for large datasets - particularly in your case because you have a calculated value which does not have indexing. How big is your dataset?

Right now the qry_DPStop has 1.64 million results. If I put in criteria for the Route_Date and removing Null, I can knock it down to 500k.

Your second query won't work unless route dates are unique - i.e. no two accountnumbers can have the same route_date.

With this data, there are several AccountNumbers that share the same Route_Date.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
still too many, will take hours - prove the concept by just getting the data for 2 or 3 accountnumbers.

And in all honesty - is anyone going to be looking at 500k rows of data?
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
I will try to prove the concept with a few AccountNumbers.

I get a"Syntax error (missing operator) in query expression" for

Code:
 SELECT *
FROM qry_DPStop
WHERE AccountNumber & ROUTE_DATE IN (SELECT TOP 5 AccountNumber & ROUTE_DATE 
        FROM qry_DPStop  T 
        WHERE AccountNumber=qry_DPStop.AccountNumber 
        ORDER BY ROUTE_DATE Desc)
WHERE AccountNumber="00302"

It looks like it's hanging up on the
Code:
WHERE AccountNumber="00302"
.

It's not that people will be looking at 500k rows of data but not every account has as many rows of data as others- i.e one AccountNumber could return data from the past two weeks and a different AccountNumber could return data from 5 months ago.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
It's not that people will be looking at 500k rows of data but not every account has as many rows of data as others- i.e one AccountNumber could return data from the past two weeks and a different AccountNumber could return data from 5 months ago.
That won't matter - the point is it will reduce the time required.

with regards the error my bad it should be

AND AccountNumber="00302"

EDIT: I'm a bit confused - your original post says AccountNumber:Right([LOCATION],8)

So I would have expected

AND AccountNumber="00000302"
 
Last edited:

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
I was able to get it to work for a specific Accountnumber last night.


EDIT: I'm a bit confused - your original post says AccountNumber:Right([LOCATION],8)

Sorry, I changed it to Right([LOCATION,5)

If it is working then we can look at ways to improve performance - which probably means moving the subquery into qry_DPstop (assuming that is possible) and ensuring you are working with indexed fields. I don't have a lot of confidence in that since AccountNumber has been derived from the Location field.

Before doing that, you can also make sure that the Route_Date field is indexed - will help the ORDER BY in the subquery.

Another thing to try is to split the Location field into its constituent parts in a table and ensure the new accountnumber field is indexed.

Also, if the route_date is in a table that has a uniquePK i.e.

PK..routeDate..AccountNmber
1...1/1/19.......A
2...1/1/19.......B
3...2/1/19.......A
etc

then we can use that instead.

I am going to test out a few things here.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
OK - but for reporting back on 1/2 million rows it will always be very slow.

Focus on a) indexing to ensure they exist and can be used and b) limit the number of account numbers you need to report on in some way
 

Mr. Southern

Registered User.
Local time
Today, 00:02
Joined
Aug 29, 2019
Messages
90
OK - but for reporting back on 1/2 million rows it will always be very slow.

Focus on a) indexing to ensure they exist and can be used and b) limit the number of account numbers you need to report on in some way

Okay. What do you think will be a reasonable amount of data?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:02
Joined
Feb 19, 2013
Messages
16,553
you would need to test it

you've done 1

try 100, then 1000

But check indexing, I keep saying it, you keep ignoring it:)that will make the biggest difference regardless of the number
 

Users who are viewing this thread

Top Bottom