Speeding up a LEFT JOIN query

Hello1

Registered User.
Local time
Today, 12:02
Joined
May 17, 2015
Messages
271
In the attachment I have put the query design. The qryNaplataKarticaSaIdPotDom alone runs just fine even tho the table has over 1m records. However, when I do a LEFT JOIN like in the attachment with this query and the table ImportNap, then it takes more than 3min for the query to load. When I remove the Is Null from the field BrojacNap the query loads quite fast, few seconds. But I need that Is Null criteria.
Brojac, Dokument and BrojacNap I have put indexes on, however no much help, if any.
Any ideas what could I change?

Thanks! :)
 

Attachments

  • 1.PNG
    1.PNG
    34.9 KB · Views: 346
Difficult to tell from just a screenshot.
It would help to see the query SQL & some sample data

Also try indexing the Datum field as well
 
Alright, here is the SQL just this time with INSERT INTO because I need it.
Code:
INSERT INTO ImportNap ( BrojacNap, DatumVriImp )

SELECT [NAPLATA - Kartica - Finansijske promene].Brojac, Now() AS Expr1

FROM [NAPLATA - Kartica - Finansijske promene] LEFT JOIN ImportNap 
ON [NAPLATA - Kartica - Finansijske promene].Brojac = ImportNap.BrojacNap

WHERE ((([NAPLATA - Kartica - Finansijske promene].Brojac)>3850185) 
AND (([NAPLATA - Kartica - Finansijske promene].[Sifra Kupca]) Is Not Null) 
AND (([NAPLATA - Kartica - Finansijske promene].Datum)<=[Do_datuma]) 
AND (([NAPLATA - Kartica - Finansijske promene].Knjizenje)=True) 
AND (([NAPLATA - Kartica - Finansijske promene].Dokument)=17 
Or ([NAPLATA - Kartica - Finansijske promene].Dokument)=18 
Or ([NAPLATA - Kartica - Finansijske promene].Dokument)=35 
Or ([NAPLATA - Kartica - Finansijske promene].Dokument)=37) 
AND ((ImportNap.BrojacNap) Is Null))

ORDER BY [NAPLATA - Kartica - Finansijske promene].[Sifra Kupca];

Now what I do here is transfer the data from table NAPLATA... to table ImportNap, but only the data which matches the criteria. I tried to remove the ImportNap table and then transfer everything what matches the criteria (without the ImportNap.BrojacNap Is Null, because its no longer there when I remove the table of course) from the table NAPLATA and it works because of the key conflict so it doesnt transfare duplicates, only the data I need, howerver, it takes too long. Probably because the query is first coping all the around 900k records which match the criteria and then imports into the table about 22k only.

Another thing, in real scenario I actually cant change Indexes in the table NAPLATA.. because I dont have access to it, so only in table ImportNap but the primary key is already indexed and probably the only index I need from that table.

I tried at my local PC with the Datum indexed which is date too but the performance didnt improve much, mostly I guess because the real pain is the Is Null from table ImportNap

Some info about data...
Brojac is AutoNumber, Sifra Kupca is Number, Datum is date, Knjizenje a Yes/No, Dokument is Number (something like which kind of document) and BrojacNap is Number (the only indexed one)
 
your first screenshot utilises a query, which is not apparent from the sql you have provided. Have you done away with the query? It was also a select query. Whilst the sql provided is an append query

Also, should be no need for the order by clause - data will be inserted randomly so you are just adding an unnecessary operation.

if your ImportNap.BrojacNap is 'indexed, no duplicates' which must be the requirement because of the way you have the append query written, then you don't need the is null criteria - all records which already exist will not be added a second time. Depending how you are running the query you might get a message '900k records could not be appended' but that is OK
 
Hey CJ, sorry for the confusion. The SQL i posted is the correct one, not the screenshoot. Thanks for cleaning up the Order by, I will get rid of it. Yes if I remove the Is Null everything will be added correctly but the query will be even slower in some cases. For example it will most probably always take 900k+ records from the NAPLATA table what makes the copying slow. If I put the Is Null the query might have some low final results and will run faster than without the Is Null, however if it has 100k+ results it will run slower than without Is Null. Not best explanation but I have you will understand.
 
Colin (Isladogs) is the speed comparison guru round here so he may well have better suggestions. The access query optimiser is pretty effective, but you might find if you split the query into two - one with the left join, the other with everything else you might see an improvement

you can also replace the multiple 'or's with an in.

Queries might be like this

Q1
Code:
SELECT [NAPLATA - Kartica - Finansijske promene].*
FROM [NAPLATA - Kartica - Finansijske promene] LEFT JOIN ImportNap 
ON [NAPLATA - Kartica - Finansijske promene].Brojac = ImportNap.BrojacNap
WHERE ImportNap.BrojacNap Is Null
Q2
Code:
INSERT INTO ImportNap ( BrojacNap, DatumVriImp )
SELECT [Q1].Brojac, Now() AS Expr1
FROM [Q1] 
WHERE [Q1].Brojac>3850185 
AND [Q1].[Sifra Kupca] Is Not Null 
AND [Q1].Datum<=[Do_datuma] 
AND [Q1].Knjizenje=True 
AND [Q1].Dokument IN (17,18,35,37)

Or even into 3 queries, or try moving one or more criteria from the second query to the 1st. Or swap them round so you create the second query first and move the insert to the first query (hope that makes sense)

On indexing there is little point in indexing booleans (Knjizenje) and if a field is predominantly null, set the index to exclude nulls (Sifra Kupca?)
 
Thanks, I will try all that after some sleep :)
I cant do anything with the indexes on the table NAPLATA - Kartica - Finansijske promene because I have no access to change it, unfortunately.
 
If BrojacNap is a number, you'd normally be looking for 0, not Is Null. Have you tried that instead?
 
Code:
INSERT INTO ImportNap ( BrojacNap, DatumVriImp )
SELECT [NAPLATA - Kartica - Finansijske promene].Brojac, Now() AS Expr1
FROM [NAPLATA - Kartica - Finansijske promene] 
WHERE ([NAPLATA - Kartica - Finansijske promene].Brojac NOT IN (SELECT BrojacNap FROM ImportNap)) 
AND ((([NAPLATA - Kartica - Finansijske promene].Brojac)>3850185) 
AND (([NAPLATA - Kartica - Finansijske promene].[Sifra Kupca]) Is Not Null) 
AND (([NAPLATA - Kartica - Finansijske promene].Datum)<=[Do_datuma]) 
AND (([NAPLATA - Kartica - Finansijske promene].Knjizenje)=True) 
AND (([NAPLATA - Kartica - Finansijske promene].Dokument) IN (17, 18, 35, 37)) 
ORDER BY [NAPLATA - Kartica - Finansijske promene].[Sifra Kupca];
 
I tried a little but it didnt seem like much of an improvement but I could be wrong, because I didnt test it in detail, had other things to finish :/

I have another question regarding SQL also but not truly for this thread, maybe to start a ne one?
Anyway, when I go into a big table, lets say over a milion records, and I want to search for a particular data. I have 2 options there, to use the find method (Control + F) or the filtering method which seems to be much faster. Now how do I know which one am I using on my SQL? If I write = "SomeString" in the criteria, is SQL using filtering or the find method? I assume its the filtering because its more logical. So what would be best to use in the SQL criteria?
Another thing, whats the difference between the earlier mentioned:
IN (17, 18, 35, 37)
17 Or 18 Or 35 Or 37

Thanks!
 
find simply navigates to the records which contain a match. It does not reduce the recordset.

As you have noted filtering is like the query criteria - but unlike the query criteria can be 'undone' to return the full dataset.

both will make use of indexes if they exist so performance should be much the same. Note that finding, filtering or criteria will be slow when you look for something within a value because indexing cannot be used so the data as to be 'inspected' sequentially.

So what would be best to use in the SQL criteria?
you have no choice - it is the equivalent of filter. However as a matter of good practice, queries should always be designed to return the minimum amount of data required in terms of both width (fields) and depth (records). Failure to do so will result in slow performance when the BE is on the server because of the volume of data that needs to be brought across the network. This applies not just to form/report recordsources but also combo and listboxes rowsources and is one of the reasons you should not use lookups/multivalue fields in table design. To give you an idea of how long, when you open the query, time how long it takes until the record number is populated at the bottom. It won't be significant with small data volumes, but certainly will with 1m records.

whats the difference between the earlier mentioned:
IN (17, 18, 35, 37)
17 Or 18 Or 35 Or 37
in terms of performance in this case if Dokument is indexed, probably not a lot, it terms of readability, quite a lot. See this link https://stackoverflow.com/questions/3074713/in-vs-or-in-the-sql-where-clause
 
Thanks
The field isn't indexed so I guess it won't matter much. When I get some extra try I will do some more of the suggested methods with databases larger than 5m, guess I should feel the performance difference there for sure.

What about pass through queries? I keep most of the databases on the SQL server. The access program I use is 32bit but the SQL Server is 64bit. Now when I do a query does the bit version even matter or all that calculating happens on the SQL Server if the database is there?
 
The field isn't indexed so I guess it won't matter much.
if the field isn't indexed, the IN method will be faster. based on the link I provided, about 25% faster, at least in respect of that part of the criteria.

Talking of IN - have you tried the IN method suggested by ArnelGP? That avoids a left join.

What about pass through queries? I keep most of the databases on the SQL server. The access program I use is 32bit but the SQL Server is 64bit. Now when I do a query does the bit version even matter or all that calculating happens on the SQL Server if the database is there?
bit version won't matter as far as sql code is concerned.

Running the query on the server will be faster in theory, but depends where your current BE is. If on your local drive, probably not much if any improvement in performance. Could even be slower if other routines are being run in SQL server by other users at the same time. SQL server does have additional functionality which may mean it can handle left joins in a different way which is faster.
 
@Hello1
Just a hunch that using the NOT IN (SELECT ... subquery approach suggested by arnel will be no faster & could actually be slower than using a left join ...but I may be wrong :D

I would be interesting to know your results comparing both approaches when you have time
 
Ah, I misunderstood it, thought speed will be same if no indexes.

I did but I didn't feel any improvement, however I need to do some further testing I don't think I tested everything suggested and correctly tho.
 
We posted at the same time, if I find time over the weekend I will try to do the testing and let you know :)
 
isladogs said:
Just a hunch that using the NOT IN (SELECT ... subquery approach suggested by arnel will be no faster & could actually be slower than using a left join ...but I may be wrong

Speed comparison in this case would depend on the number of records in the sub-query because of the order in which the query would have to be processed. Since the field in question isn't indexed, that is a huge speedbump either way.

In general, when doing a cascaded "IN" filtration (whether by in-line sub-query or by layered query), you want to select from the shortest list first (i.e. innermost) because it more rapidly limits the work done by outer filtration. Actually, that rule works for other than "IN" cases as well.
 

Users who are viewing this thread

Back
Top Bottom