Sort by MAX Version query - Very slow

ironfelix717 · Sep 20, 2019

Hi,

I am experimenting with a query over the network for an in-development application. FE is access and BE is access.

With a test of 100,000 records, this query is very slow.

Fields of concern are: PK autonumber, OperationID, and version.

OperationID can be duplicative. Each record with a given OperationID has a specific record version.

I need to retrieve each OperationID in the recordset, and return its data
where its version is the MAX for its category.

Ex:
ID------OPID--------Version
1-------ABC1--------1
2-------ABC1--------2
3-------ZXY2--------1

Query should return:
ID------OPID-------Version
2-------ABC1--------2
3-------ZXY2--------1

I've been using this query/subquery:

Code:

sql = "SELECT opID FROM External2 WHERE External2.ID IN (SELECT TOP 1 Dupe.ID FROM External2 AS Dupe WHERE Dupe.OpID=External2.OpID ORDER BY Dupe.Version DESC) ORDER BY OPID;"
/CODE]

But this is way too slow over network. Does not complete with 2 minutes which is far too slow.

Only indexes are on primary key. 

Thanks for any help!

pbaldy · Sep 20, 2019

See if it works better as two queries:

http://www.baldyweb.com/LastValue.htm

arnelgp · Sep 20, 2019

try this:

select max(id) as maxOfID, opid, count("1") As Version from external2 group by opid;

plog · Sep 20, 2019

I would do this with a subquery:

Code:

SELECT OPID, MAX(Version) AS LastVersion
FROM External2

Name that 'sub1'. It's going to find the highest Version for each OPID. Then use it in another query to get the results you want:

Code:

SELECT External2.ID, External2.OPID, Version
FROM External2
INNER JOIN sub1 ON External2.OPID=sub1.OPID AND External2.Version=sub1.Version

3 notes:

1. If you don't need the ID field in the final results then 'sub1' will do what you need in the most effecient manner.

2. Arnelgp's solution will be the most efficient provided you never have a gap in versions. If OPID=1 has 2 records with a gap in versions (Version=1 & Version=5) it will not return accurate results.

3. An index on OPID and Version would help in all cases. Also, if Version only contains numeric data, it should be of a numeric type. A Varchar or other text field for this will cause speed issues.

ironfelix717 · Sep 22, 2019

plog said:
I would do this with a subquery:

3 notes:

1. If you don't need the ID field in the final results then 'sub1' will do what you need in the most effecient manner.

2. Arnelgp's solution will be the most efficient provided you never have a gap in versions. If OPID=1 has 2 records with a gap in versions (Version=1 & Version=5) it will not return accurate results.

3. An index on OPID and Version would help in all cases. Also, if Version only contains numeric data, it should be of a numeric type. A Varchar or other text field for this will cause speed issues.

PLOG,

Great tips and advise. Thanks a lot for your help..

In regards to item #1 of your comments, can i interpret your words as if I do not need the Primary Key (ID) in my data, there is absolutely no reason to
use the JOIN query? I do not need the primary key (ID). Just associated data with the max version. So, with other fields i need data for I was able to get everything i need with:

Code:

SELECT External2.OpID, External2.Type, External2.State, External2.Kind, External2.Number, Max(External2.version) AS LastVersion
FROM External2
GROUP BY External2.OpID, External2.Type, External2.State, External2.Kind, External2.Number;

And this is MUCH faster than the original method.

Thanks best wishes.

plog · Sep 22, 2019

In regards to item #1 of your comments, can i interpret your words as if I do not need the Primary Key (ID) in my data, there is absolutely no reason to
use the JOIN query?

You just moved the goalposts with your request. You initially mentioned 3 fields, now you dump 4 new ones into the equation. How GROUP BY queries work are hard for people to get their mind around. Often at first glance they seem to provide what people want, but upon closer inspection, they don't.

I would inspect the return data closer to make sure it truly is giving you what you want in all cases. And/Or write 2 paragraphs for us--the first is a brief overview of what it is your database does within the context of your organization. No databnase jargon, just an explanation of what it is helping you do. The second paragraph you can use a little database jargon and explain what it is you specifically want from this query.

ironfelix717 · Sep 23, 2019

Hi PLOG,

Here is the info you request:

The project's current context is a non-commerical personal project that has been in development for about 14 months now. This application is being pre-developed by myself for use in a small manufacturing business that currently does not exist. The anticipated scope of this app is 10 concurrent users (doing various tasks, running extensive reports being one that is probably very uncommon) and that is quite generous. The backend is simply an ACCESS file stored locally on the development workstation -- which will change to SQL Server in further development stages.

The query this thread discusses is not specific to any single table due to the framework that I have developed for how my database works: which is that every record is versioned. Any time an element of the system is saved/modified, a new record is created and versioned. This practice holds true for every key table in the system. The primary reason is to roll back data - which yeah SQL server can do some cool stuff with rolling back data without this architecture, but I am not a database expert and this is the method that works for my skillset.

I have considered another approach to versioning which is dumping the previous version to a history table and a new saved item goes to the master table, which makes querying current records a lot faster but in terms of development, i think this might be more work for me in the long run due to other factors (testing the history table if a specific version exists, etc).

The best assumptions I can make for you generalizing my needs are:

1.) This query is usually only need when a user has to make a selection to load an element. They need a list of the data to load. Other query criteria may exist depending on how they use the form to filter the listbox.

2.) However, the query usually only need the element's ID (opID), and a few other fields (say, 5 max) to put into a listbox for the user to select from.

3.) Primarykey (ID) is always indexed autonumber field - no surprise there.

4.) The element ID (OpID) is always short text

5.) Versions may or may not contain gaps - they are Number datatype with field size long integer.

6.) There will be a realistic limit designed into the app for versions and managing expired/unneeded versions - to keep things from growing exponentially - which i suspect will really not be a problem given the scope of this project.

Thanks for your help!

Sort by MAX Version query - Very slow

ironfelix717

Registered User.

pbaldy

Wino Moderator

arnelgp

..forever waiting... waiting for jellybean!

plog

Banishment Pending

ironfelix717

Registered User.

plog

Banishment Pending

ironfelix717

Registered User.

Similar threads

Users who are viewing this thread