Performance — the jumping problem — too many database (or something else) calls

The algorith complexity is the base of algorith performance but, there are planty of other problems that need discussions.

One of them, I personaly call “the jumping problem”, that in fact you, software developer, probably already know and probably have experienced.

The jumping problem happens when you call something like database, files, APIs, libraries and even your own “heavy” functions outside of you core application too many times inside your bussiness logic.

There is no receipt for performance, when you have this kind of problem many things should be studied, but in many cases call too often something “outside” your application it’s a bad ideia, and you probably know that. But, do you know how bad it is?

I couldn’t find a post that was exactly what I picture about this kind of problem, so here I try to show the basic nature and some numbers.

The “global” rule is: jump to another place as minimum as needed. This sounds kind obvious but the lack of this “concern” on the daily basis showned that sometimes even the “homework” are not being done.

The most commom/basic scenario:

var result = GetResultsFromDatabase();
foreach(var row in result)
{
var otherResult = GetOtherResultFromDatabase(row);
//Do something
}

It’s possible to replace the “database scenario” for anything else that you know it could take some time to be performed. Another point-of-view could be the health of “the other side”. Maybe call too often a database can downgrade you system performance AND the database server performance, the same can happen using and API. Of course there are many strategies (maybe in the future posts) to cache the requests for many tecnologies. But remeber: the homework must be done.

Tools:

Scenarios:

  1. Retrieve the first 100 sales and, for each sale, retrieve it’s respective vendor;
  2. Retrieve the first 100 sales and its vendor together.

Scenario 1 — Numbers:

  • Jumps to the database: 101
    1 jump to retrieve firsts 100 sales and + 1 each vendor.
  • Time spent on the database: 187583 ms

Scenario 2 — Numbers:

  • Jumps to the database: 1
    1 jump to retrieve the joined query.
  • Time spent on the database: 45094 ms

Here we can see that a simple “forgotten concern” cost to us about 300% more time to be executed.

Many things should be noted and, as I said before, there isn’t a gold rule, but here we go:

  1. These times are on my machine;
  2. The database are fully available for my application;
  3. Numbers of columns selected can also affect this time;
  4. Could be cases where it’s slowly to retrieve all information joined, it depends on the scenario, indexes, tecnologies and so on;
  5. Could be cases where a business logic prevent the use of such thing.

Reproduce it yourself:

Copy the repository code here and stack here.
I did not connect the Docker to the Visual Studio solution because I felt comfort running the application locally and the database on a container.

Create the db and adminer containers:
After downloading a copy of the stack, execute the commands below. It would auto create and import the Adventure Works database.

cd /path/to/copy/of/repository
mkdir postgres_data
docker-compose build
docker-compose up -d

Setup the Visual Studio Profiler

For this post I left just Database checked

Run the Profiler

Look at the diagnostic report

Details:

  1. The profiler display 101 records reads in the first line and 2 on the others because, I guess, the rows counts as one or more and it’s metadata counts as plus one;
  2. It’s mandatory to use Entity Framework to display the query on the profiler (or ADO.net). Source here;

Software Performance Engineer wannabe