using compiled queries v SQL code

CJ_London · May 27, 2014

I've been reviewing the performance of some of my VBA with a view to trying to improve the performance. The code normally works on around 3-4000 records and has been perfectly satisfactory, but now I need to use it on a much larger recordset - circa 100,000 records.

The code creates a nested set and does this by selecting a single record and if found, updates it and if not it resets to look for the next one.

Code:

While LftRgt < MaxLft
        
        'select potential child for a parent
        sqlStr = "SELECT TOP 1 fplTree_1.EntityPK FROM fplTree INNER JOIN fplTree AS fplTree_1 ON (fplTree.CTable = fplTree_1.PTable) AND (fplTree.ChildFK = fplTree_1.ParentFK) WHERE fplTree.TreeLevel=" & TreeLevel & " AND fplTree_1.TreeLevel=0 AND fplTree.Branch=" & Branch
        Set rst = DB.OpenRecordset(sqlStr)
        
        If rst.EOF Then 'there are no children so finish with this parent
            sqlStr = "UPDATE fplTREE SET rgt= " & LftRgt & ", TreeLevel = -TreeLevel WHERE Treelevel=" & TreeLevel
            DB.Execute (sqlStr)
            TreeLevel = TreeLevel + 1
        Else 'there is a child so update and set child as new parent
            TreeLevel = TreeLevel - 1
            DB.Execute ("UPDATE fplTree SET Lft = " & LftRgt & ", TreeLevel = " & TreeLevel & " WHERE EntityPK=" & rst!Entitypk)
        End If
        
        LftRgt = LftRgt + 1        
        Set rst = Nothing
    Wend

In summary, there are 100,000 records which ultimately are selected once and updated twice.

This process takes about 1 1/2 hours to run and as it works through the dataset it takes longer and longer for each loop. For the first 20000 records the loop is running at a rate of around 200/second or better which is acceptable but by the time the last records are being referenced the rate has dropped to around 15 loops/second. I believe this is due to updating the treelevel index which I don't think I can do anything about - but any suggestions welcome

. TreeLevel has a narrow range of values (typically 9 or 10 reflecting the different levels of the tree) and is initially updated with a negative number which is changed to a positive number when the record is finished with. I've tried updating all the treelevel fields to null before starting and modifying the index to ignore nulls, but this is actually slower.

Reference is often made to the overhead of needing to compile the sql before running, so I modified the above to use a querdef instead

Code:

Set Q1 = DB.CreateQueryDef("", "SELECT TOP 1 fplTree_1.EntityPK FROM fplTree INNER JOIN fplTree AS fplTree_1 ON (fplTree.CTable = fplTree_1.PTable) AND (fplTree.ChildFK = fplTree_1.ParentFK) WHERE fplTree.TreeLevel = TLevel AND fplTree_1.TreeLevel=0 AND fplTree.Branch = " & Branch)
    Set prm1TLevel = Q1!TLevel
    
    Set Q2 = DB.CreateQueryDef("", "UPDATE fplTREE SET rgt = LftRgt, TreeLevel = -TreeLevel WHERE Treelevel = TLevel")
    Set prm2TLevel = Q2!TLevel
    Set prm2LftRgt = Q2!LftRgt
    
    Set Q3 = DB.CreateQueryDef("", "UPDATE fplTree SET Lft = LftRgt, TreeLevel = TLevel WHERE EntityPK = EID")
    Set prm3LftRgt = Q3!LftRgt
    Set prm3TLevel = Q3!TLevel
    Set prm3EntityID = Q3!EID
    While LftRgt < MaxLft
        prm1TLevel = TreeLevel
        Set rst = Q1.OpenRecordset
        If rst.EOF Then
            prm2TLevel = TreeLevel
            prm2LftRgt = LftRgt
            Q2.Execute
            TreeLevel = TreeLevel + 1
        Else
            TreeLevel = TreeLevel - 1
            prm3TLevel = TreeLevel
            prm3LftRgt = LftRgt
            prm3EntityID = rst!Entitypk
            Q3.Execute
        End If
        LftRgt = LftRgt + 1
                
        Set rst = Nothing
    Wend

I was surprised to find it took exactly the same amount of time to run. Am I missing something?

Fran Lombard · May 27, 2014

If your backend is on a sql db I would suggest moving the code to a stored procedure.

CJ_London · May 27, 2014

In this case it isn't, it is an Access (2010) backend in a networked location. If I run it with the backend on my C drive it is a little quicker (saves about 15 minutes) but that wasn't really the point of my post - which is that running a compiled query appears to be no quicker than running uncompiled and I was wondering if I was missing something in my code.

Of course I am interested in trying to improve the time this routine takes but unless my client invests in SQL server - or even terminal server - I can only really look at the efficiency of the code/queries!

Fran Lombard · May 27, 2014

CJ
I can't really speak to compiled queries vs uncompiled since I don't use queries in my system for processing - all processing is done in stored procs on the server.

I can say that in my experience updates tend to be the more expensive than inserts. So if you are completely rebuilding a tree you may be able to improve performance by building the complete new tree in a memory array - wiping out the old tree either by deletes or dropping and recreating the tree table and finishing with a series of inserts.

This all assumes the computers that would be running the process will have enough memory to store the temp tree.

Good luck - Fran

CJ_London · May 27, 2014

Hi Fran,

That is a thought - I'm aware of the relative overheads of update v delete/insert and in an earlier iteration of this code I did append Lft, Rgt and treelevel to a different table, but the update route was in fact quicker by a significant margin.

But what I haven't tried is appending to a table in memory and then writing it back to the db - watch this space!

Fran Lombard · May 27, 2014

CJ

One other thought for you, if you find you must update a node on the tree in memory, you might want to look at storing the values in a XML DOM instead of a memory array. Since xml represents a tree I think finding a particular node in the dom is faster than looping through an entire array especially when the array gets big.

Fran

Galaxiom · May 27, 2014

A saved query includes an execution plan that is saved the first time the query is run. Creating that plan is the only time saved on subsequent runs. It would be barely noticable. The execution itself is the same.

One potential advantage of not using a saved query is that the execution plan is recreated using the latest statistics. When the number of records changes from just a few to many the best plan might also be different so running with the saved plan can be a disadvantage.

Recreation of the execution plans one reason to consider decompiling the database once the record numbers build up.

CJ_London · May 28, 2014

Fran - XML DOM looks very interesting. It is not an area I have explored up to now (or even had any real awareness) but I think my current challenge could be a good fit for experimenting. Ultimately the data needs to be stored in a table because it needs to link to other tables.

Galaxiom - Thanks for the explanation. I had thought the creation of the execution plan over 100,000 iterations would produce a measureable (and hopefully worthwhile) saving - but as you say, it is barely noticeable so I've gone back to my original code.

Galaxiom · May 28, 2014

CJ_London said:
I had thought the creation of the execution plan over 100,000 iterations would produce a measureable (and hopefully worthwhile) saving - but as you say, it is barely noticeable so I've gone back to my original code.

I think a lot of these factors were significant back when computers had clock speeds measured in MHz and RAM in MB.

(My first PC had a 1 MHz clock, 512 KB of RAM, a floppy disk and no hard drive.)

CJ_London · May 28, 2014

My first PC had a 1 MHz clock, 512 KB of RAM, a floppy disk and no hard drive

Mine was an Apple IIe with 8k of RAM and two x 143k floppy drives

, I think the clock worked on a small spring! But I was one of the first people in the UK to implement a network of Apples via a Corvus hard drive in a commercial environment - one of the first challenges to mini computers. The system had to manage memory as well as data. But like cars it is so much more complicated these days...

Thales750 · Jul 15, 2014

Is there some reason why you don't switch over to an SQL server, or MySQL, and use some kind and use stored procedures?

CJ_London · Jul 15, 2014

Is there some reason why you don't switch over to an SQL server, or MySQL

Yes - the client!

My routine was developed for Access working on relatively small recordsets - organisation charts/product assemblies etc, typically a few thousand records and performance has not been an issue. My clients requirement was to 'map out' the customer base - a much bigger dataset.

vbaInet · Jul 15, 2014

The main problem with both approaches is that it's constantly opening and closing recordsets. On top of that, with the second approach, it's constantly creating querydef objects.

Another thing I can't see in your code is if you're doing a MoveLast or MoveFirst. I'm hoping you're not needing to do that.

Here are some suggestions/comments/thoughts:

1. I'm not sure how many times these operations are being performed within the loop (because I can't see the rest of your code) but you can use the Filter method or FindFirst/Next methods whilst ensuring your recordset stays open.

2. In combination with (1) you can also try batch updating. Perhaps update the first 10,000 or 20,000 records. I would use ADO for this.

3. You mentioned in-memory processing, it's something I would normally do with Excel and not Access. Simply Excel is just a 2 dimensional array so it's easy getting an array, updating it and sending it back to Excel. I don't see it being possible with Access. If you do however want to get a 2 dimensional representation of your recordset you can use GetRows in DAO.

4. Fran mentioned XML DOM, I've tried that on a large dataset and wasn't impressed. You still have to traverse the tree even though there are relevant methods for hitting child nodes. Maybe I was missing something because I didn't spend too much time on it but at the time it just wasn't quick enough. I think the references I used at the time were MSXML 3.0 and MSXML 6.0 and one other that I can't remember. On the other hand, there's the Open XML format which I hear is very good!

using compiled queries v SQL code

CJ_London

Super Moderator

Fran Lombard

Registered User.

CJ_London

Super Moderator

Fran Lombard

Registered User.

CJ_London

Super Moderator

Fran Lombard

Registered User.

Galaxiom

Super Moderator

CJ_London

Super Moderator

Galaxiom

Super Moderator

CJ_London

Super Moderator

Thales750

Formerly Jsanders

CJ_London

Super Moderator

vbaInet

AWF VIP

Similar threads

Users who are viewing this thread