Where to find a 'standard guide to performance' for db operations?

amorosik · Aug 14, 2023

In the forum, indications often appear on how to use one method rather than another to perform a certain operation on the data connected to the db
Suppose we are in the most generic case, an Access interface that connects to data via odbc driver
Where to find a guide that allows you to optimize, from the point of view of performance, the operations performed on the data?
(if it doesn't exist then we can create it now)

ebs17 · Aug 14, 2023

My guide is the following quote from Michael Zimmermann:
Performance tuning is usually not based on "secret" high-speed commands, but on mathematics: How many bytes are moved, how many steps are required, what additional costs arise for RAM management, processor load, disk access.

To make work tangible: Think of each byte of data as a bottle of water that you have to carry from the basement to the fifth floor.
A lot of work takes a lot of time, little work is done quickly. So recognize what work is really necessary and which steps are better left out. To do this, you should be familiar with the development environments you use and know exactly how VBA, SQL (in various dialects) and others work.

Pat Hartman · Aug 15, 2023

The two recommendations you see here all the time involve reducing the data that gets moved about. But, I like the quote because that is the essence of the recommendations.

1. Select only the columns and rows you need for the task at hand. Every query needs a where clause unless you really want all the data in a table.
2. Queries are almost always superior to code loops.

The_Doc_Man · Aug 15, 2023

I'm with ebs17 on this one. Your question falls squarely in the lap of a system analyst. It is all about load management. You have to ask yourself questions such as: How much data has to be moved? How many discrete steps are involved in the overall operation? What are the "choke points" in the process? Of the choke-points, which one is the rate-limiting step?

You expressed concern over an ODBC driver. This means you have network questions. Know the speed of your network link in bytes per second and try to predict (or measure) bytes transferred. If you aren't sure, here are some tools and ideas to consider.

1. In your process of interest, do some logging of timestamps so that you can get a before and after time to determine the elapsed time for some event you wanted to analyze.

2. Open the Windows command prompt (Start >> Windows System >> Command Prompt) and use netstat -e to determine how many bytes have been transferred. Again, do that before and after the event of interest so you can subtract the numbers to get the difference. It is less directly comparable, but netstat -s also gives you protocol level statistics that would give you a measure of the system level of effort.

Repeat 1 & 2 a few times with a few different reports or whatever you think will give you a good mix of activities over the network. From the statistics you get out of that, you can compute approximate network data rates.

3. Open Windows Task Manager and switch to the performance graphs for your network. You can see the pattern of what you are doing. When you are "beating the hell out of your network" the pattern will be a flat line well above the 0 line. That will tell you that whatever you are doing, it is the most you can possibly do. And if that is what you see, that becomes your benchmark.

IF you have a full-duplex Ethernet, you can get a lot of data through that "pipe" pretty easily. If you have a half-duplex connection, however, you should expect to see that your network maxes out at what you THINK is about 1/3 of the maximum data rate. Therefore, it is important to know your network statistics. Since Ethernet is a CS/C

The more systems that are on your network, the more frequently you will have Ethernet collisions. (NETSTAT -e can report on that statistics.) On busy networks, the collision rate goes up quickly and acts to "throttle" the network at about 33% capacity. On relatively idle networks, it can do better.

4. If you have multiple ways to do things based on query structure, the above set of measurements has to be repeated for each way so that you can compare.

It is always about "knowing your load" and knowing how to manage that load.

ebs17 · Aug 16, 2023

Some typical guidelines:

- Excavator instead of sand shovel, mass processing beats serial individual processing. A SQL query will often be more efficient than a loop construct.

- (Query) Index usage, means having an index and also keeping it usable via the query formulation.

- Reduce the amount of data as early as possible with simple actions, only then apply complex actions to the reduced amount of data.

- Keep the number of object accesses low, load data appropriately into the main memory and process it there. In Excel, for example, it makes a huge difference whether you start all the cells in a range individually or whether you load the range into an array, calculate in the array and then copy the array back into the range.

- Clean and processing-friendly data structures (data schema). SQL works line-oriented, so the tables have to be structured accordingly, otherwise it becomes cumbersome, slow and unsolvable.

Where to find a 'standard guide to performance' for db operations?

amorosik

Active member

ebs17

Well-known member

Pat Hartman

Super Moderator

The_Doc_Man

Immoderate Moderator

ebs17

Well-known member

Similar threads

Users who are viewing this thread