Update 'Why analytic projects need at least two databases'
parent
3796ae06e6
commit
bcaa7a74e0
14
Why-analytic-projects-need-at-least-two-databases.md
Normal file
14
Why-analytic-projects-need-at-least-two-databases.md
Normal file
|
@ -0,0 +1,14 @@
|
||||||
|
I started a quite long thread at my work's slack when I posted benchmarks showing how fast DuckDB and Polars are compared to Pandas and PostgreSQL. One of the first comments mentioned that, while the comparison with Pandas made sense, the comparison with PostgreSQL didn't. It's pretty obvious that embedded columnar query engines are faster than OLTP row-based DBMD. This made me reflect on why I thought that the comparison made sense: many times, and it's something I've done myself, projects use a single DBMS because **it's way more important to have a single source of truth than maximizing performance, and transactional information takes preference from analytical queries**.
|
||||||
|
|
||||||
|
And let's be honest, PostgreSQL's performance in OLAP workloads is sometimes good enough.
|
||||||
|
|
||||||
|
I'm struggling to find the best possible way to keep two separate databases for OLAP and OLTP respectively. One stores the transactional information, while the second is read-only replica. It's obvious that you can't only have these databases, you need a way to keep them in sync, and suggesting Airflow as part of the stack outright closes the discussion about the two databases for minimalistic projects.
|
||||||
|
|
||||||
|
I'm thinking about two alternatives today:
|
||||||
|
|
||||||
|
* PostgreSQL + DuckDB
|
||||||
|
* PostgreSQL + Clickhouse
|
||||||
|
|
||||||
|
It's shocking how similar these two options are in terms of features. Both Clickhouse and Duckdb have [connectors for Postgres](https://duckdb.org/2022/09/30/postgres-scanner.html#other-use-cases) and mirror tables can be updated by executing a query with a cronjob or similar.
|
||||||
|
|
||||||
|
Maybe that's the way to go, and It's an ide I'm willing to test in a real environment in some of my future projects. Has anyone tried something similar?
|
|
@ -1 +0,0 @@
|
||||||
two databases
|
|
Loading…
Reference in a new issue