Recordsets and network speed

303factory

Registered User.
Local time
Today, 07:11
Joined
Oct 10, 2008
Messages
136
Hi all

We have recently moved our database onto a WAN, with the database server now being located in a different city to me.

The problem is that (apparently) due to network lag my code seems to be running very slowly. My client has to read several thousand entries in an XML file and write it into our database. Until now I've been doing this by opening an ADODB recordset and adding the data line by line.

This has always ran very quickly (several thousand entries in a few minutes) but now takes well over half an hour to do the same operation. The network team inform me that the method I use is encountering the network lag for every new line added and I need to find a way to upload all the data in one move.

Below is (a cut down for example) version of my code. We're running an SQL Server 2008 back end with linked tables, and communicating with an SQL Server Native Client 10.0 ODBC. Unfortunately I'm quite new to SQL Server and am not an experienced programmer so am looking for advice for a different method to use. I have a very limited timescale so can't afford to spend time testing/researching alternatives, but if someone knows why this is a slow method to use and could point me in the right direction it would be massively appreciated.

Code:
    ' SQL Connection
    Set gSQLdbase = New ADODB.Connection
    gSQLdbase.ConnectionString = "Provider=SQLNCLI10;Data Source=[ip],[port];Network Library=DBMSSOCN;Initial Catalog=[odbcName];uid=[username];pwd=[password];"
            Case "MID Client SQL SOCA.mdb"
    gSQLdbase.Open



    Dim rstRS As ADODB.Recordset
    Dim strSQL As String
 
    strSQL = "SELECT * FROM tblMessages WHERE 1=0"
    Set rstRS = New ADODB.Recordset
    rstRS.Open strSQL, gSQLdbase, adOpenKeyset, adLockOptimistic, adCmdText

    intRegNumber = 0
    strMessages = MESSAGES & "[" & intRegNumber & "]"
    Set objNode = objXMLFile.selectSingleNode(strMessages)
      
    Do Until objNode Is Nothing
        
              
            With rstRS
                
                    .AddNew
                    
                    .Fields("Message") = objXMLFile.selectSingleNode([nodelocation])
		    [same as above line for about 10 other fields)
						
            End With

            intRegNumber = intRegNumber + 1
            strMessages = MESSAGES & "[" & intRegNumber & "]"

            Set objNode = objXMLFile.selectSingleNode(strMessages)
        
	Loop
        
	If Not rstRS.EOF Then rstRS.Update
        rstRS.Close
        Set rstRS = Nothing
 
No doubt someone will be recommending a disconnected recordset at some stage. ;-)
That stands a good chance of improving things, but it won't actually satisfy the request you've made. (Though to be fair, not many things will given the requirements!)

You have thousands of rows to process.
You'll struggle to remotely execute anything like that number of records in a single batch.
Indeed to do so is leaving yourself open to some problems.
Consider the bandwidth of a single update passing even just the raw text of those thousands of rows. If this is a single process (an entire batch, succeed or fail) then any interuption in that submission to your server will cause the entire process to fail.
(Even with a local server the process took a few minutes iteratively.)

What a disconnected recordset will give you is a chance to update in batches.
You could do this by executing specific dedicated batches yourself, or by letting ADO do that for you.
What I mean by this is that, even if you "fill" your recordset up with your thousands of rows, upon your batch update to the server ADO will actually submit distinct batches of rows in distinct statements to the server. (If memory serves it's commonly about 15 statements or so).

OK, there's every chance this will substantially increase your performance.
By my reckoning 30 / 15 = 2. ;-)

Your alternative is to start relying on server processing.
You then just pass the text of the rows to a server procedure, and allow it to perform the record additions.
You could then pass as many rows as you want at once. As a string the server can parse these back out and add the individual rows.
Since this is SQL 2008 you could even build up and pass as a Table Value parameter.
(Access doesn't support this directly - but there are hacks to build a Table variable first and pass that).

However, as I was saying, passing the entire process as a single batch is probably asking a lot.
It's a lot of data to pass as a single parameter, it's a lot to parse back out.
But it would give you the chance to do this as a single process rather than the iterative batches that even a disconnected recordset will provide.

So, to offer something tangible...
Something akin to:

Code:
    strSQL = "SELECT * FROM tblMessages WHERE 1=0"
    Set rstRS = New ADODB.Recordset
    rst.CursorLocation = adUseClient
    rstRS.Open strSQL, gSQLdbase, adOpenKeyset, adLockBatchOptimistic, adCmdText
    Set rstRS.ActiveConnection = Nothing
 
...
    'Add the XML node data
...
 
    Set rstRS.ActiveConnection = gSQLdbase
    rstRS.UpdateBatch
    rstRS.Close
    Set rstRS = Nothing

Cheers.
 
No doubt someone will be recommending a disconnected recordset at some stage. ;-)
That stands a good chance of improving things, but it won't actually satisfy the request you've made. (Though to be fair, not many things will given the requirements!)

You have thousands of rows to process.
You'll struggle to remotely execute anything like that number of records in a single batch.
Indeed to do so is leaving yourself open to some problems.
Consider the bandwidth of a single update passing even just the raw text of those thousands of rows. If this is a single process (an entire batch, succeed or fail) then any interuption in that submission to your server will cause the entire process to fail.
(Even with a local server the process took a few minutes iteratively.)

What a disconnected recordset will give you is a chance to update in batches.
You could do this by executing specific dedicated batches yourself, or by letting ADO do that for you.
What I mean by this is that, even if you "fill" your recordset up with your thousands of rows, upon your batch update to the server ADO will actually submit distinct batches of rows in distinct statements to the server. (If memory serves it's commonly about 15 statements or so).

OK, there's every chance this will substantially increase your performance.
By my reckoning 30 / 15 = 2. ;-)

Your alternative is to start relying on server processing.
You then just pass the text of the rows to a server procedure, and allow it to perform the record additions.
You could then pass as many rows as you want at once. As a string the server can parse these back out and add the individual rows.
Since this is SQL 2008 you could even build up and pass as a Table Value parameter.
(Access doesn't support this directly - but there are hacks to build a Table variable first and pass that).

However, as I was saying, passing the entire process as a single batch is probably asking a lot.
It's a lot of data to pass as a single parameter, it's a lot to parse back out.
But it would give you the chance to do this as a single process rather than the iterative batches that even a disconnected recordset will provide.

So, to offer something tangible...
Something akin to:

Code:
    strSQL = "SELECT * FROM tblMessages WHERE 1=0"
    Set rstRS = New ADODB.Recordset
    rst.CursorLocation = adUseClient
    rstRS.Open strSQL, gSQLdbase, adOpenKeyset, adLockBatchOptimistic, adCmdText
    Set rstRS.ActiveConnection = Nothing
 
...
    'Add the XML node data
...
 
    Set rstRS.ActiveConnection = gSQLdbase
    rstRS.UpdateBatch
    rstRS.Close
    Set rstRS = Nothing

Cheers.

Fantastic thank you! Just to clarify, the code example you have given me is a disconnected recordset, but it's worth looking at creating a server procedure to add the recordsets, which would presumably take longer to impliment but offer a significantly improved processing time, but may cause problems for very large amounts of data.
 
Well, as I was the only one to have responded so far, you probably didn't need to quote the text to which you were referring ;-)

But I'd say you've summarised the situation quite succinctly. (Something I tend not to do very often :-)

As always, theory and knowledge are your friends - but there's nothing quite like actually testing the implementations to know which is truly the one to implement.
(Though bear in mind that your single shot large update may work perfectly... until the day it doesn't. Who knows how rock solid your WAN connection is. But then an "all or nothing" insert might be just what you want/need. And, just to be clear, I wasn't literally saying that the disconnected recordset method will take 2 minutes - obviously it still requires sending and processing time - I was just making a point that there would be fewer transmissions. :-)

Cheers.
 
I might be a bit out of my depth here but......

The problem as you describe it is down to the volume of data traffic down a slow pipe.
Although it seems to break rules running the routine at the Server must be the answer, then the data communicated is minimise and the chance for corruption heavily reduced.

pb
 
  • Like
Reactions: SOS
For some reason my post does not show up on the summary page!
Some kind of intelligence filtering ??? !!!!
 
Yeah, there will absolutely be a fundamental limit of available bandwidth.
Which is partly why I like the half-way house of intermittant updates. (They could be manually batched into groups - and not rely on ADO to do it).

But ultimately, the problem in this case seems to come back to the question:
"The network team inform me that the method I use is encountering the network lag for every new line added"

That doesn't necessarily directly relate to "the volume of data traffic down a slow pipe" - it's apparently more about the time to send and retrieve each update through a recordset. (I'm unclear what rules are being broken though?)

The OP should first try to reduce the number of attempts the code makes to write individual rows to the database.
Testing on that theme will at least provide productive feedback from the network guys?

Cheers.
 
  • Like
Reactions: SOS
I think we are saying the same thing. The 'slow pipe' is adding an undesirable overhead to the processing time.
My poor choice of words, 'broken rules', not normal practice would be better. I have ben told many times backends are for Tables, though this seems to be an exception.

Having re-read his post though it will still depend where the XML files is. If it is with the backend end then it would be quick to move whatever includes the routine into the backend and run it there. As long as there are not lots of screen updates he should see an immediate improvement.
Someone cleverer than me might be able to say how that could run using the frontend.

Chers
 
I'm afraid you're losing me a bit. (I'm not always sure where we're saying the same thing ;-)

This is my understanding of the current situation the OP has:

-- Custom Application Running on Client onsite. Remote SQL Server BE.
-- Client Application Receives XML file.
-- Client Application procedure iterativel walks through the XML adding individual rows.
-- Each single write request incurs overhead (server update and time too respond to calling code).
-- Process takes > 30 mins instead of "few" mins.

My initial suggestions for 303factory are either:
-- Perform updates to the database in batches, either using default ADO functionality or manually create these batched statements.
-- Push the whole raw data to be appended down the wire and let the server process it.

The current problem seems to stem not from a bottleneck but the relative time it takes for each recordset update request to be fully returned from the remote server and ready for the next. (Yes this is ultimately due to a slower "pipe" than the local network - but that's near inevitable).

A bottleneck could definitely become a problem by trying to push the whole shebang over the ether at once.

I agree that one question is how this XML file of data is received.
It's, presumably, at the client site - but could it be received on the server without the need for the app to then resubmit it to the server?
That's the real separation issue. One way or another that data needs to go down the (now longer, thinner) wire.

If this were all running on the server (for example a TS instance to the client add running on the server - and everything occured there) then we'd not really encounter this issue at all. As always, it would just be the UI updates of the TS session coming across - which works very well for many applications (and instances of Access apps).

But to address your mentioning:
>> "'broken rules', not normal practice would be better. I have ben told many times backends are for Tables, though this seems to be an exception."

This really doesn't apply to server databases.
If there is processing (and business logic) which it makes sense to do on the server - then we do it on the server!
With a Jet/ACE (MDB/ACCDB) backend file you have no choice and it's perhaps that which you're thinking of. Your processis all done locally. That includes running any VBA code in your client app and any database engine work.
A server runs entirely on the server and is, of course, very much capable of running procedures (not VBA client-type procedures).

The Inherent XML abilities of SQL Server could come into play if you can get the raw XML file to the server. (Looking at the example code it doesn't look like you're having to perform any particular transformation to this).
The issue comes down to that separate and discontiinuity between client and server.
Whether you get the data there in iterative batches or en masse, it needs to go over that wire - and it would be nice if it didn't.

Cheers.
 
I wonder if changing the recordset's options to use adForwardOnly since you are only appending records and don't need to walk the recordset there's no sense in needing a cursor and forward-only cursor is more efficient than other cursors. Also, if you don' t need results in return, have it set to execute asynchronously by also specifying adAsyncExecute. It won't speed things up but it should at least enable the UI to be more responsive. If you need a means of monitoring the progress, you could bind the recordset with WithEvents keyword (will have to be in a class module to do that) and listen to RecordChangeComplete, and perhaps use a counter to keep track of how many records are in the 'limbo'.

Hope this helps.
 
What about finding a way of transferring the xml file (probably via an ftp server) over to your server, and performing the recordset operations locally? Then deleting the file afterwards.
 
Re: adForwardOnly.
Well - that's something worth mentioning.
The disconnected code as suggested is, by definition, using a Client Side cursor.
That means that ADO is responsible for the cursor service which is running locally on the client.
This could well prove more advantageous than any (including adForwardOnly) cursor type running on the server.

I'd imagine monitoring the Events of the async executions won't exactly help performance, as you say. And since the updates are currently occuring one at a time then that's a definite way to monitor the progress :-p

But good points as usual!

Cheers.
 
>> "What about finding a way of transferring the xml file (probably via an ftp server) over to your server, and performing the recordset operations locally? Then deleting the file afterwards."

I said Kiddly, Diddle I? :-s
 
similar thing - how many tables are you actually updating?

can you not extract the xml info into a local table(s) in your database, then append the whole table to the true remote backend?

hardly any change of code required at all for this - just initially append to a different table (set of tables) -

and then have a new append query to do the real data load as a single operation - that could easily be encapsulated into a rollbackable transaction
 
right lots to look into here, thanks peeps

for now I have speeded things up by using disconnected recordsets, and making a couple of lookup tables our client uses local instaed of remote... speeds are just about acceptable for now.

I'll probably bump this thread to the top again when I've have enough time to try out these suggestions and have some more questions :P

as always thanks for your advice I don't know how I'd ever make any progress without this site
 
Afternoon.

I've almost said all I have to say on this subject as far as suggestions - so I'll shut up after this. (303project can try them or not, and we'll see in time).

However, just to mention something on the local import and append concept...

Extracting the XML data into a local table - so that's performed locally yes?
(It'll inevitably eat up some time itself, but suppose it's nice and fast as the table's local - faster than the "few" minutes to a local LAN SQL Server).

As this table is local, and the server is, inevitably not, then we're talking about a linked SQL Server table yes? (Or, at least, a query which connects to the server without a linked table - but still using ODBC).

That will, essentially, be prone to the exact same kind of problems as the OP's initial request.
The inserts were being performed row by row by iterative ADO code.
The local table append will be performed in exactly the same way. Jet parsing the data to ODBC row by row for insert on the server (and likely a very unnecessary request for an Identity return value - which will have been an influence on the ADO inserts too).

If there's a difference in execution, it'll be due to the difference in technologies implemented (ADO Vs linked ODBC tables). That can have a bearing - but very likely not on the scale that we're looking for improvement here.

I just thought I'd mention this, rather than it be considered a neat little "single shot" gig to conveniently get the data to the server.

Cheers!


P.S. I see that 303 has posted since I started writing this. I'm not only late due to my own distractions for once! (I couldn't get in to AWF to submit this post. Man, this forum at times... I dunno...)
Well - glad there's improvement (the disconnected implementation sounds almost exactly as I'd expected - better but not ideal).
See how you go in time with the various suggestions as you feel the need.

Cheers again!
 

Users who are viewing this thread

Back
Top Bottom