PyConES24/docs/dw.md
Borrell.Guillem@bcg.com 237cf81fe6 Deploy documentation too
2023-09-18 10:45:12 +02:00

1.8 KiB

The Data Warehouse

What is a data warehouse?

A Data Warehouse (DW) is a database or a set of databases that:

  1. Implement a resource as close as possible to a single source of truth for the entire corporation.
  2. Provide relevant aggregated metrics and Key Parameter Indicators (KPI).
  3. Store historical records of relevant metrics to assess the performance of the corporation.

DW are a hubs of data. On the input side, data is periodically fetched from all transactional systems by a set of batch processes. These processes don't just copy the data from transactional systems verbatim, they will execute a set of transformations and aggregations to make the final outcome easier to work with, and generate the KPI that are relevant to high-level analysis.

On the output side, DW provide a unique and aggregated vision of the corporation that is used across the board. DW are not critical to keep the operations up and running, but they are key to assess and improve performance across the entire corporation. DW are also leveraged to implement a wealth of use cases to generate more value for customers and stakeholders like:

  • Supply chain management and control.
  • Campaign performance analytics.
  • Executive dashboards.
  • Churn and upsell scoring.
  • Demand forecasts.

DW tend to be implemented with analytical databases because data is recorded in batches, and queries are mostly aggregations of sets of historical records. Depending on the size of the corporation and the number of data sources, DW are pretty large and expensive to build and to maintain. Data warehouses may contain thousands of tables with thousands of batch processes fetching and transforming data to its final shape.

Extract, Transform, and Load (ETL)

Data Governance: Lineage

It's common to use the data warehouse as a scratch space for users doing analytics.