Does Jet do joins efficiently?

jal

Registered User.
Local time
Today, 12:12
Joined
Mar 30, 2007
Messages
1,709
Does Jet do joins efficiently?
I have a Jet database packed with 10 million records (account numbers) ,and I'm very impressed how fast a user can look up an account number.
On the other hand I was surprised by the following scenario. I have two tables (let's call them Customers and Orders to make this discussion easier). In this case, Orders has the ten million records.

Basically I wanted to do a regular join because Customers and Orders have a field in common (CustomerID). In other words:

SELECT Customers.Type, Orders.* FROM Customers
INNER JOIN Orders on Orders.CustomerID = Customers.CustomerID

Due to our IT dept's group policies (long story) I have to keep my queries short and quick - so I only pull one record at a time. I do this by adding a WHERE clause (to the above query) as to only pull one record at a time, like ths:

WHERE OrderID = @OrderID.

So I use a loop to pull multiple records, one by one, one record at a time. For example yesterday I had to look up a thousand records. The loop was taking several minutes. So I said to myself, "Hey I'll try something silly. Instead of doing a real join, I'll first pull the record from the Orders table (using a recordset) and then I'll pull the corresponding record from the Customers table (using another recordset). Once again I'll pull just one record at a time." And I told myself, "This new version of the loop should take twice as long, because there is no way it can be as fast as letting Jet do the join. After all, database engines are designed for efficient joins."

Well, I was wrong. The dual-recordset solution turned out to be three times faster than the join operation! (Actually I was using the VB.Net equivalent of a recordset, that is, a dataTable w/ dataAdapter).

I was so much in shock and disbelief that I erased my VB code, redid the old code, and ran it again - again the loop took several minutes. Then once again I put in the new code, and the loop completed in about one minute.
All the columns in both tables are indexed (using single-column indexes).
 
So I can find no explanation for this anomaly other than to ask myself, "Does Jet do joins inefficiently?"
 
 
first thought -

is the customerid indexed in the orders table? - if not, then that will not be fast at all.



second thought -
if you have 10million rows, then the join itself must be a big thing. I think there is a setting that restricts the number of rows access retrieves (1000 as default I think - otherwise it would retrieve millions of rows) - this is the same reason that large combo boxes etc, are not prefilled with all values, with large datasets.


I think process efficiency becomes a real consideration when dealing with large data sets. You do have to consider ways of minimising the amount of data being extracted


maybe this

query1 = select * from customer where customerid = whatever
query2 = select * from orders where customerid = whatever

and then join query1 and query2, would be quicker.

Its worth trying different approaches, I think to see what sort of difference there is.
 
Last edited:
  • Like
Reactions: jal
gemma-the-husky said:
first thought -

is the customerid indexed in the orders table? - if not, then that will not be fast at all.
All columns are indexed, in both tables. That's precisely why single-table lookups are running so fast.

second thought -
if you have 10million rows, then the join itself must be a big thing. I think there is a setting that restricts the number of rows access retrieves (1000 as default I think - otherwise it would retrieve millions of rows) - this is the same reason that large combo boxes etc, are not prefilled with all values, with large datasets.
I'm no engineer, but I would have expected Jet to somehow optimize this join, given that the WHERE clause is only pulling basically one record from each table (well, one OrderID, at any rate). The result set usuallly has ten "orders" (for that one OrderID) corresponding to one row of the Customers table. So we're only pulling 11 rows on average - ten from the Orders table and one from the Customers table.


I think process efficiency becomes a real consideration when dealing with large data sets. You do have to consider ways of minimising the amount of data being extracted, maybe this

query1 = select * from customer where customerid = whatever
query2 = select * from orders where customerid = whatever

and then join query1 and query2, would be quicker.
Brilliant! Thank you!!! Out of sheer curiosity, I might go back and try that approach. I thought I knew how to write basic queries, but you've just proven my ignorance. Perhaps the reason this idea never occurred to me is that I tend to assume that Jet - at least for simple joins - will do this kind of optimization automatically.

And I don't mean to sound disappointed in Jet. I had written a 2nd version of the app using an SQLite database, which everyone said was "way faster" than Jet. Not in my tests. In my experiments over the last weak, Jet beat the pants off SQLite, both in INSERTs and SELECTs. Had I known, when I started this project, that the data store had ten million records, I would have been afraid to even try using Jet. But now I'm sure glad that I gave Jet a chance. Often it looks up an account number in less than one second !!!

Not only that, but when I first open the app and run the first query, in some cases SQLite delayed making that first connection - sometimes I had to wait three minutes. SQLite does have some desirable features, for instance unlimited size (no 2-gig limit), and is reputed to scale better than Jet (although now I won't believe this until I see it for myself). But it also lacks some desirable features. For instance it isn't strongly typed (you can mix all datatypes in the same column). If you choose to include an autonumber column, that column MUST be chosen as the primary key (this can be a little annoying). Most importantly it doesn't support (as far as I know) JOINs on action queries such as UPDATEs and DELETEs. (This can be virtually devastating to application logic and performance when you're trying to set up DELETEs or UPDATEs for a table with ten million rows).

And as far as I can see, a Jet 4.0 dataase (.MDB file) is just as cost-free as using SQLite (since I'm not using Access to manipulate it). I'm not aware of any licensing costs associated with Jet 4.0 (correct me if I am wrong). I believe the new Jet engine (Ace) is free too (correct me if I am wrong).

Kudos to the Jet team!
 
I dont think that its to do with efficiency of joins per se.

I am sure its more that processing anything with very large tables means a different working paradigm, I suppose.

Searching a single table is different, as you can use the b-tree structure (which is what I imaging DBS indexes use) to search a single table very efficiently, however big it is. A B-tree of millions of records will still only drop to 4 or 5 levels of iteration - so it is very efficient to search. Have a look at wikipedia on b-trees.

But joining a table with millions of rows to another table is not the same thing at all, I suspect.

------------------
eg - In most businesses you print out a list of your employees say, when you do a payroll - or print out a list of your customers to do debt-chasing.

With very large organisations, these approaches just arent practical. Instead, you have to come up with a series of procedures to examine certain categories of exceptions, or particluar cases.

So both the management processes, and the software have to do things differently.
 
What if you do this instead:

SELECT Customers.Type, Orders.* FROM Orders
INNER JOIN Customers on Orders.CustomerID = Customers.CustomerID
Where OrderID = SomeOrderID

Given OrderID is also indexed.
Technically it should not matter, but I have seem Access do some things one would not think would matter.
 
What if you do this instead:

SELECT Customers.Type, Orders.* FROM Orders
INNER JOIN Customers on Orders.CustomerID = Customers.CustomerID
Where OrderID = SomeOrderID

Given OrderID is also indexed.
Technically it should not matter, but I have seem Access do some things one would not think would matter.

If i understand correctly you are changing which table is in the FROM clause. I had already tried that even before I started this thread. Good suggestion, though.
 
have you got anywhere with this?

another way would be to save the selected account in a temp table, and join that to the orders table

that will probably work quicker


I am sure its all because you have 10m records - so you need to try a few different things.
 

Users who are viewing this thread

Back
Top Bottom