Performance — when complexity O(n) is not enough — Threads
It’s still a monolict, but it’s get better.
There’s a lot of recipes to improve performance on every single system. Cloud is not new anymore, distributed systems became closer of every single task inside systems of any size. But, until we get there. There’s some tips to improve the time spent processing any kind of data.
The strategies that I try to explain here, in this post, can be applied on many scenarios, but sometimes the explanations are biased on my daily basis activities, that normally include something like: well, we improved the algorithms the best we could but, can we distributed it in order to process the entire data faster?
Another biased thought is that the code usually will be written in C#, but the idea transcend the chosen language.
So, let’s get to work.
Background:
You did a lot of improvements, even the basic ones, but to process that million sets of data still take a lot of time to be done. In this scenario one more possible improvement could be the use of Threads.
I consider the easiest (and the first) approach the one I usually call: “task list”. It is exactly what its sounds like: the creation of a list that contains all tasks that should be performed. And then, send them all to a kind of “task splitter”. That will split the list into blocks of n tasks and then create a thread to process it.
This strategy has its cons, like:
- The time spent to build the entire list;
- The memory (or something else) used to store the list.
But, as far as the ideia of the post it’s to show how threads can help in the overall performance of some kind of process, the example should be enought.
Scenario:
The task is: summarize the total amount and quantity purchased from every single vendor in the system.
Approaches:
- Given a list of vendors, for each one, seek at database total amount and quantity purchased;
- Given the same list of vendors, create a task list which each task perform the wanted sum of one vendor, then send it to thread pool.
Times:
- Using a single thread: 11530029 ticks.
- Using a thread for each vendor (in the thread pool): 5482439 ticks.
Using the single thread approach give us 110% less efficiency (in ticks) than the multi thread.
Reproduce it yourself:
- Copy the code repository here and stack here.
- Create the db and adminer containers:
After downloading a copy of the stack, execute the commands below. It would auto create and import the Adventure Works database.
cd /path/to/copy/of/repository
mkdir postgres_data
docker-compose build
docker-compose up -d
The example is composed of three “special” classes:
- The interface, that generalize the tasks:
- The “portion”, that execute a block of tasks:
- The splitter and thread creator, which create the portions, and send them to Thread Pool: