Update 'How fast are really the analytical DBMS?'

Guillem Borrell Nogueras 2023-02-12 21:56:20 +01:00
parent 26103e3573
commit 0b6fa7831d

@ -1,3 +1,27 @@
You've probably heard that columnar analytical databases are able to query data so much faster than traditional row-based databases like PostgreSQL, Oracle or MySQL. But how much? 10 times faster?, 100 times faster?
First of all, it makes complete sense that there's a sensible difference in performance between OLAP and OLTP databases.
The goal of PostgreSQL is to work reliably under the pressure of hundreds of clients executing thousands of transactions per second. You can be sure that, if the transaction ends, data will be ready for the next query.
Analytics DBMS are designed for speed, and one shouldn't expect the same guarantees as with PostgreSQL. You should expect... speed.
I did a small experiment with a synthetic dataset I created for the PyCon Spain '22 workshop. It simulates records of roulette bets in an online casino. The dataset has 277M rows, and the CSV containing the data will take 28GB of your disk.
![Table](https://git.guillemborrell.es/guillem/blog/raw/branch/main/images/analyticdbms/Screenshot_20230212_215503_Chrome.jpg)
Here are the results for postgresql
![Postgresql timings](https://git.guillemborrell.es/guillem/blog/raw/branch/main/images/analyticdbms/Screenshot_20230212_213038_Chrome.jpg)
Here are the results for duckdb Here are the results for duckdb
![Duckdb timings](https://git.guillemborrell.es/guillem/blog/src/branch/main/images/analyticdbms/duckdb.jpg/Screenshot_20230212_175900_Chrome.jpg) ![Duckdb timings](https://git.guillemborrell.es/guillem/blog/raw/branch/main/images/analyticdbms/duckdb.jpg/Screenshot_20230212_175900_Chrome.jpg)
And here are the results cor clickhouse
![Clickhouse timings](https://git.guillemborrell.es/guillem/blog/raw/branch/main/images/analyticdbms/Screenshot_20230212_180415_Chrome.jpg)
Additionally, clickhouse used rougly 3.5 GB of memory to execute the query, while duckdb ran with a limitation of 4GB RAM memory. The limitation was followed, since the container used for these tests would have crashed if it weren't.
Clickhouse was 88 faster than PostgreSQL.