Access 2003 - 60 million rows Inner Join?

Jcase

Registered User.
Local time
Today, 14:54
Joined
Sep 23, 2008
Messages
40
Office 2003 SP3

Hey guys. I've been working at this problem for quite some time but I it's been giving me some issues.

Essentially I'm running a few queries that performs an inner join and then updates one of the two tables

Code:
UPDATE TableA AS A INNER JOIN TableB AS B ON A.Dog  = B.Dog
SET A.OWNER = B.OWNER
(Variables Simplified)

There is essentially 3 or 4 of these queries all run one after the other from a macro. Anyways I began receiving 'Not enough memory on Temporary Drive' errors. After browsing the internet I tried the 'MaxLocksPerFile' and 'MaxBufferSize' Registry tweaks to no avail.

The two tables are pulled via ODBC to a large Database for backup (1.5GB). Then exported to my local machine where I work the magic. I've compacted+repaired the database down to 27 megs. And the two tables only contain roughly 15k-30k rows max.

When running these queries separately I noticed they're trying to update 60+ MILLION rows and then I get the memory error. I've tried everything [Clean temp file, up paging file, copy and paste all components to a seperate DB, Delete all the rows in the table and paste them back in from source] and nothing has worked.

Any help is definitely appreciated. Thanks for your time guys.
 
Try enclosing the update query in a transaction.


If you leave the tables on the server (not sure why you are updating Jet tables in this case), you can use a pass-through query which will run without issue. The problem that arises in the case of local or linked tables when updated with a Jet query is that Access wants to be able to give you a chance to back out the update so it does the whole thing in memory, asks if it is OK, and then commits the changes or rolls them back. A pass-through query will not give you the option to roll back. It will just run on the server.

Another thing to try is to break down the update into segments. If Dog is an alpha field, make sure there is an index on it and run several queries selecting a-e as the first letter then f-j, etc. If it is numeric, use number ranges.

I'm afraid I don't understand what you mean by enclosing it in a 'transaction'. [EDIT: I now know what you mean, was trying a few things on the Oracle side. I'll probably end up going this route for now.9/24/2008]

This process was set in motion by the previous guy who had my position. The way the procedure was setup was to upload data to an Oracle Database, create tables, run a few PL-SQL scripts, import those into Access and run the macro. Very cumbersome, annoying, and I don't really understand why he designed it this way.

That segmented idea is great, I will try that first.

2 Questions : Can you run a pass-through query on Access?
What could be the cause for Access to say 60 million lines? Some sort of bug or a circular reference that keeps building up the temp file?

Thank you very much for your help! It is definitely appreciated and is giving me a shiny beam of hope to look towards.
 
Last edited:
It looks like even with it segmented it's still pulling around 1000 rows (The 'Dog' Value is numeric and can have multiple rows with the same value.) just running one of the queries has taken roughly 15 minutes and is only about 20% complete on the progress bar. It seems there is a bigger problem here.
 
From the numbers you gave us,

What could be the cause for Access to say 60 million lines

I'd say somehow you have a "permutation" join (a.k.a. "Cartesian" join.) This is generally caused by failing to set up, or incorrectly setting up, a relation between the two tables.

1. Is there an established relationship between TableA and TableB across field Dog?

2. Is the relationship either one-to-many or one-to-one? (Can't be many-to-many because Access doesn't do that....)

3. What is the approximate cardinality of the relationship between the tables (whether formally declared or not...)?

Cardinality is "if you do an average query for a value appearing the table, how many rows do you get back?" Lower cardinality is good. Best possible is 1 (meaning the field is unique across the whole table), worst possible is to get back 50% of the table.

Let's see, SQRT( 60 million) is just less than 8,000. A permutation join of two tables of exactly 8,000 records each would have given you 64 million updates. So if one of the tables has 15K rows and the other table has 4K possibly matching rows, there's your exact number.
 
Excellent information The_Doc_Man!

1. I setup a one to one relation between the Dog field in the relationship editor and tried it again. It still takes several (10+) minutes before returning the error message (Not enough space on temporary disk. (Error 3183)). I'm just trying to run a single Update query, I'm leaving the macro alone for now.

2. The cardinality is fairly low, a single value at most will return 5 rows, the average is 2.[Thank you for that terminology!] There is apprx. 10800 unique values for the Dog field.

I really appreciate the help guys!

EDIT: I think I'm going to try resolving this server-side before importing it to Access for now. This whole problem has been bugging me because I was able to run this operation before without the any problems last month. When I saw the 60 million lines I became curious. It has definitely benefited me with my Access knowledge though. I really appreciate it :)
 
Last edited:
Unless Dog is unique in both tables then it shouldn't be used for a 1 to 1 relationship.

If Dog is unique in 1 table and used several times in the other then it should be a 1 to Many relationship.

If Dog is not unique in either table then you will have problems and will get lots and lots of records.
 
Unless Dog is unique in both tables then it shouldn't be used for a 1 to 1 relationship.

If Dog is unique in 1 table and used several times in the other then it should be a 1 to Many relationship.

If Dog is not unique in either table then you will have problems and will get lots and lots of records.


Hrm, I misunderstood then. Dog is not unique in either table (usually each table will have 2 rows with the same value). Thinking about how Access would treat the inner join on a field with multiple rows of the same values - it would make sense that the amount of rows it would be working with would be exponentially larger!

EDIT: And I just remembered that I did have to tweak the MaxBufferSize to get it to work last month. Apparently this month the inner joins are resulting in even more rows. Suddenly the 60million rows makes much more sense.
 
Hrm, I misunderstood then. Dog is not unique in either table. Thinking about how Access would treat the inner join on a field with multiple rows of the same values - it would make sense that the amount of rows it would be working with would be exponentially larger.

EDIT: And I just remembered that I did have to tweak the MaxBufferSize to get it to work last month. Apparently this month the inner joins are resulting in even more rows. Suddenly the 60million rows makes much more sense.
As The_Doc_Man said if you have 3 records in table A with Dog=3 and 4 records in table B with Dog=3 then you will get 12 records in your result.

In table B if .dog is the same does that mean that .owner is the same? if not your update will be indeterminate. If it is then you need to have an intermediate query which uses Distinct to trim out duplicate values of .dog from table B and use that query to update table A.Owner.

That query would be something like

SELECT Distinct dog, owner from B;

Store that as QueryC

then you can update Table A as follows

UPDATE TableA AS A INNER JOIN QueryC AS B ON A.Dog = B.Dog
SET A.OWNER = B.OWNER
 
As The_Doc_Man said if you have 3 records in table A with Dog=3 and 4 records in table B with Dog=3 then you will get 12 records in your result.

In table B if .dog is the same does that mean that .owner is the same? if not your update will be indeterminate. If it is then you need to have an intermediate query which uses Distinct to trim out duplicate values of .dog from table B and use that query to update table A.Owner.

That query would be something like

SELECT Distinct dog, owner from B;

Store that as QueryC

then you can update Table A as follows

UPDATE TableA AS A INNER JOIN QueryC AS B ON A.Dog = B.Dog
SET A.OWNER = B.OWNER

[Side note = The Average cardinality is actually 1]

Since we're heading more into the logic of the program I'm going to change the variable Dog to Order#.

Every Order# in A should appear in B. A and B track slightly different information (I'm working with dispenses from a machine (A) and the receiver of the dispense (B). The Order# will appear multiple times if there is a problem with the transaction [Something was withdrawn then needed to be returned, etc etc] in either A (dispense problem) or B (receiving problem). This Order# ties the Dispense with the Receiver. The field I'm updating is the Station that the Order was pulled from - it's updated so that both tables can be attributed to the same station in a report that is generated.
 
Last edited:
Is this logical? Are the transactions in the two tables really all related on the order#? Or is there a subsidiary relationship that means that there is a one to one within the groups of transactions. For example, product is issued and then returned creating a relationship between the issue record in A and the return in B. This is clearly not a classic many to many relationship. I don't know how to describe it, not being a proper programmer.
 
The Order# is generated from several criteria like the User, Date, Item and other fields. These Updates are just to fill in the blanks (E.g. Order# 1 Was withdrawn from Station A, we want to tie the receiver's information to that Station, so any rows with Order #1 in table A would set table B's Station field to A. )

Sorry if this is confusing guys.

I'm going to have read up on doing those Inner Join updates on the Oracle DB we have. It seems SQL_Tools doesn't support it.
 
While I work out the Oracle side of things, I did some more thinking about this 'Out of Memory' issue.

Could copy+pasting these tables/queries out of the main storage Database cause any memory issues? Even with all this great information I feel like it's not related directly to the problem. I have this nagging feeling that it's just a bug or some sort of remnant link that's causing it to take literally 15-20 minutes to run these inner join updates before bombing out or (if I up my MaxBufferSize in the registry) processing 2 or 3 of these queries before giving me the error.
 
I am an idiot. Sorry guys. This month during the course of the procedure I didn't remove all the non-matches during the procedure. So essentially the SQL that was there already didn't account for the non-matches. So I basically had roughly 10k Order#s of 0 in Table A and 6k in Table B.

6k x 10k = ?
All I did was add 'WHERE Dog <> 0' and life was back to normal.
Sorry guys. Thanks for the help. It looks like the cardinality was half the table in this case and I wasn't aware of it. Just needed a nap or two.
 
Last edited:
I am an idiot. Sorry guys. This month during the course of the procedure I didn't remove all the non-matches during the procedure. So essentially the SQL that was there already didn't account for the non-matches. So I basically had roughly 10k Order#s of 0 in Table A and 6k in Table B.

6k x 10k = ?
All I did was add 'WHERE Dog <> 0' and life was back to normal.
Sorry guys. Thanks for the help. It looks like the cardinality was half the table in this case and I wasn't aware of it. Just needed a nap or two.
Glad to hear you have got it fixed.
 

Users who are viewing this thread

Back
Top Bottom