Deleted old documentation

This commit is contained in:
Guillem Borrell 2024-05-08 21:50:12 +00:00
parent be5bcbdba3
commit 745f852637
18 changed files with 5 additions and 606 deletions

View file

@ -1,3 +0,0 @@
# API
API stands for Application Programming Interface.

View file

@ -1,7 +0,0 @@
# Applications and value
!!! example
While in a pitch, one of the clients said that she didn't want the call center agents to look at multiple screens. If we developed anything we had to integrate it into their current tool. That meant that we couldn't develop exactly what we thougth could bring more value, but we had to study what could be integrated with the current tool instead. I really wanted to challenge that. Her hypothesis was that more complexity made call center agents less productive, but I've seen examples of the opposite. While tuning a similar application in a previous client, I did the same challenge to the applicaiton developers. Maybe we were showing too much things in the screen. But once we put the applicaiton in front of call center agents, they asked to see even more informaition!
Assumptions have to be tested, and it's surprising to me how consultants challenge client's assumptions on how they run their business, but we decide not to contewst important choices about usability and design. That client was more than willing to sacrifice tons of functionality, and maybe millions, due to an hypothesis that could perfectly be wrong.

View file

@ -1,24 +0,0 @@
# How automation is implemented.
Automation is a key topic in enterprise data systems. When some conditions are met, you want things to happen. There are automations remove from the stock the items that are checked out, to send a message to the stocker's terminals when an item changes price, to make a stock request to a warehouse when a store is about to run out of a particular item, to apply discounts at check-out after changing the price tag on the shelves...
Automation is at the heart of enterprise systems. Any issue related to a critical automation will impact the business, and there may be hundreds or thousands of those automations. This is the reason why corporations and governments spend large sums of money to build platforms that are as robust as possible. This is why some migrations to a cloud platform take years to complete with costs that sometimes triple the initial budget.
!!! example
There are tons of examples of unsuccessful transformations caused by the enormous complexity of business automation. The Birmingham's City Council wanted to [migrate](https://www.datacenterdynamics.com/en/news/uks-birmingham-city-to-spend-465m-fixing-oracle-cloud-issue/) from on-the-premises systems to the cloud. The budget went from £19M at the beginning of the project to £38M two years later, and £100M four years after the start. At the moment of writing this example systems were not fully functional, and there was no estimation for the final delivery date. The officials responded that these kind of delays were *not unusual* for migrations of such magnitude, and they're absolutely right.
For each automation you should decide the execution time, with many possible choices:
* Immediately when the condition occurs
* Periodically, each hour or every day at midnight
* On demand after a required authorization
# Triggers to act on a given condition
Let's open the `stock` terminal run a little experiment.
# If it works, don't touch it
Creating robust automations is key for the operations of any corporation. Automations tend to become systemic, and if somethigs works as expected, there's a strong motivation to touch it as little as possible. Even nowadays it's common to find out that the automations dealing with payments are still implemented with COBOL, and in that case, the process to migrate from COBOL to other "enterprise" language like Java, takes half a decade to complete.

View file

@ -1,30 +0,0 @@
# Buy vs Build
Modularization of enterprise software components. This makes running businesses more sustainable.
## Bulid vs buy an off-the-shelf solution.
Bulding enterprise data systems is hard and expensive. This is why many companies decide to purchase an ERP like SAP, which is able to handle financials, purchasing, stock, customer relationships, reporting... This seems to be the safest choice by far
* Popular applications tend to be less buggy, more secure and feature-rich.
* Third party appliations by popular vendors tend to add synergies, like easier integrations and a wider range of vendor choices for support and maintenance.
* The total cost of ownership tends to be lower by the buyer, even considering that companies like Oracle and Salesforce are immensely profitable.
!!! info
Some software vendors are so profitable that many wonder if they're charging excessive margins for their products. Larry Ellison, CEO and founder of Oracle, owns [the sixth largest island in Hawaii](https://en.wikipedia.org/wiki/Lanai), and the 2022 Formula One Championship season winning team was "Oracle Red Bull Racing". The parnership between Oracle and Red Bull is somewhat funny since most are positive that companies selling soda make huge margins selling a bad product mostly thanks to marketing.
But appearances can be deceiving.
* If the problem is complex, the solution will be complex as well. It may take multiple years of effort and tens of consultants to fully deploy one of these products.
* Proprietary software is always bundled with some additional vendor lock, like supporting a small subset of proprietary storage systems.
* Buying software implies vendoring knowledge. If there's a critical issue with one of these components and the vendor is not able to provide support, or it bankrupts and disappears, the engineers within the corporation won't be able to fix the issue no matter how smart they are.
The most common scenario is to run a mix of applications: mission-critical operations are run by custom, in-house-developed applications maintained by the IT department or functional areas, while other less critical operations like Marketing use third party tools. I've encountered many corporations that run most of their operations on custom-built software, but they decided to buy a proprietary CRM like Salesforce to support their Marketing and Sales departents. Most projects executed by IT deparments are related to data integrations between two existing tools, or between existing and some new tool the leadership decided to purchase.
## Bespoke software and the false buy vs build dichotomy.
There's a third option, which is somewhat in betwen buy and build: hiring a third party with a solid track record to develop a custom application with the goal of getting the best from both worlds. Bespoke software by a solid vendor may provide:
1. Lower risk, since the vendor has built similar applications for previous clients.
2. The application is fully customized, and knowledge stays in-house, since the development can be closely followed by the staff engineers.

View file

@ -1,211 +0,0 @@
# How data is stored
## How data is stored physically
Enterprise data systems are heavily distributed because large corporations are distributed in nature: a chain of grocery stores consists of a set of locations that are relatively independent from each other. Distributed systems, when designed properly, are very robust in nature. If every location can operate independently items can be sold even under a major system failure, like losing the connection between a site and the central database that stores the available stock for each item. But at the same time, disconnecting each site from a central information repository is a challenging data engineering problem.
!!! info
A key property for data stored in an enterprise context is consistency, which is very difficult to guarantee when data is spreaded across multiple nodes in a network that can be faulty at times.
The CAP theorem, sometimes called Brewer's theorem too, is a great tool to provide some theoretical insight about distributed systems design. The CAP acronym stands for Consistency, Availability and Partition tolerance. The theorem proves that there's no distributed storage system that can offer these three guarantees **at the same time**:
1. Consistency, all nodes in the network see the exact same data at the same time.
2. Availablity, all nodes can fetch any piece of information whanever they need it.
3. Partition tolerance, you can lose connection to any node without affecting the integrity of the system.
Relational databases like need to be consistent and available all the time, this is why there aren't distributed versions of PostgreSQL where data is spreaded (sharded) across multiple servers in a network. If there are multiple nodes involved, they are just secondary copies to speed up queries or increase availability.
If data is spreaded across multiple nodes, and consistency can't be traded off, availability is the only variable engineers can play with. Enterprise data systems run a set of *batch* process that synchronize data across the network when business operation is offline. When these batch processes are running data may be in a temporary inconsistent state, and the database cannot guarantee that adding new data records is a [transaction](https://en.wikipedia.org/wiki/Database_transaction). The safest thing to do in that case is to blocking parts of the database sacrificing availablity.
This is why the terms *transactional* and *batch* are used so often in enterprise data systems. When a database records information from some actual event as it happens, like someone checking out a can of soup at a counter, that piece of information is usually called a *transaction*, and the database recording the event a *transactional system* because its primary role is to record events. The word *transactional* is also used to the denote that integrity is a key rquirement: we don't want that any hiccup in the database mistakingly checks up the can of soup twice because the first transaction was temporarily lost for any reason. Any activity that may disrupt its operations has to be executed while the system is (at least partially) offline.
## How data is modelled
Data models are frequently [normalized](https://en.wikipedia.org/wiki/Database_transaction) to minimize ambiguity. There are probably three different kinds of 500g bags of white bread, but each one will have a different product id, a different [Universal Product Code](https://en.wikipedia.org/wiki/Universal_Product_Code), a different supplier... As it was mentioned in the section about relational databases, relations are as important as data: each item can be related to the corresponding supplier, an order, and batch in particular. Stores and warehouses have to be modelled as well to keep stock always at optimal levels. Disounts have to be modelled too, and they're particularly tricky because at some point the discount logic has to be applied at check-out.
Data models should be able to track every single item with the least possible ambiguity. If there's an issue with a product batch we should be able to locate with precision every single item and remove it from the shelves, and know exactly how many of those items were purchased. Any source of ambiguity requires manual intervention. For instance, it's possible that a store receives multiple batches with different times of expiry at the same time. In that case stockers have to make sure that the oldest batch is more visible in the shelves, put the newest batch at the bottom of the stack, and record when all items of each batch are sold or returned.
The digital twin has less than [twenty data models](https://github.com/bcgx-gp-atlasCommunity/data-engineering-fundamentals/blob/main/src/dengfun/retail/models.py), which is an extreme simplification of an actual retailer. This is the entity relational diagram, that may get outdated.
![relational_diagram.png](img/relational_diagram.png)
!!! tip
If you have to handle databases with dozens of normalized data models you should use a proper database management tool like [dbeaver](https://dbeaver.io/) or [datagrip](https://www.jetbrains.com/datagrip/).
The entity relational diagram above was generated by dbeaver.
Note how each model is related to a physical entity like an item, a batch, a cart, a location... Each table has a primary key that identifies a single record that is used to build relationships. For instance an item in a cart ready to check out is related to a cart, and a location.
## Users, authorizations, and permissions
Probably the hardest feature to implement in enterprise contexts is access control, in other words, making sure that anyone who's supposed to see, add, remove, and delete some information can do it in practice, while blocking everyone else. Some change may require managerial approval, like changing the available stock of a particular item in a warehouse, in that case the system should issue the approval requiest to the correct person.
There are two main strategies to achieve this:
1. Access Control Lists (ACL), a set of rules in a database that implement the logic of who's allowed to see, add, remove, and delete what. In this strategy *multiple users haveaccess the same applicaiton* with a different *role*. Each object has a set of rules attached that are applied for every single operation. This is common in financial institutions where all customer-facing employees have access to the same terminal but each operation on each object requires different levels of authority. For instance, any employee may be able to see the balance of a customer, but not to approve the conditions of a mortgage. ACL tends to be so hard to implement that corporations seldom build their own solution, and by one from a popular vendor.
2. Instead of attaching rules at each data object, one can create a *different application for each role*. This is common in businesses where each worker works in a different location. Cashiers in a grocery store have access only to points of sale, stockers will have a handheld device with stocking information, managers will have access to a web application with privileges to return items, modify stocking information... In this case access control is implemented at a system level. Each one of those appliations will have its own authentication profile, and will be authorized to access a subset of API and other data resources.
None of these strategies is infallible and universal. The final implementation will probably be a mixture of the two. Workers in stores and warehouse may have role-specific terminals, while members of the HR department may have access to an ERP (Enterprise Resource Planning) that implements ACL underneath. Other common data operations like transfers, migrations, backups and audit logs may add more complexity to the final design. IT departments typically have full access to all data stored within the company, or parts of it, and while they can jump in to fix issues, they can break stuff too.
!!! example
Customer communications like emails are data too, and may be required by the regulator to investigate any suspicious behaviour. [JPMorgan had to pay a $4M fine](https://www.reuters.com/legal/jpmorgan-chase-is-fined-by-sec-over-mistaken-deletion-emails-2023-06-22/) to the regulator after a vendor deleted 47M emails from their retail banking branch. A vendor was trying to clean up very old emails from the '70 and the '80, that are no longer required by the regulator, but they ended up deleting emails from 2018 instead. In enterprise data systems some data models and resources have an *audit lock* property that prevents deletion at a system level.
The digital twin doesn't implement ACL, and each role will have a separate terminal instead.
!!! warning
Data access issues may be considered security threats if allow a user to escalate privileges. It's common for large corporations and software vendors to deploy red teams that to find these kind of vulnerabilities. There are also bounty programs intended to motivate independent security researchers to communicate these issues instead of selling them on a "black security market".
## Governance: metadata management.
Data Governance is the discipline of providing the definitions, policies, procedures, and tools to manage data *as a resource* in the most efficient way possible. It's a very wide topic that involves managing both data and people. The goal is to create sustainable data environments that thrive as an integral part of the organization.
!!! danger
Beware of experts on data governance. There are orders of magnitude more experts on data governance on LinkedIn than successful implementations. I've witnessed talks about best practices on data governance from "experts" that were unsuccessful implementing them at their own company. From all experts, the most dangerous are the ones selling you the X or Y application that will get Data Governance sorted out for you.
The topic of data governance will be split in multiple sections across this case study. The reason for that is that Data Governance has to be implemented across the entire lifetime of the data, from its initial design to the dashboard that shows a handful of dials to the CFO. There's no silver-bullet technology that just implements Data Governance. It's the other way round, governance defines and enforces:
1. Schemas are defined and documented, including standard patterns to name columns
2. Who owns what, and which are the protocols to access, modify and delete each piece of data.
3. How data transformations are instrumented, executed, mainained and monitored.
4. In case something goes wrong, who can sort out what is going one, and who can fix it.
Let's talk a bit about point 1, which is related to metadata. Metadata is all the additional information that is relevant to understand the complete context of a piece of information. If a table has a field with the name `volume_unpacked` there should be one and only one definition of volume across all databases, and a single definition of what is an unpacked item. The same database that stores the data can store this additional information too. If the field *volume* in the *item* entity has units of liter. This is how the model `Item` is defined as a SQLAlchemy model:
```python
class Item(Base):
__tablename__ = "items"
sku: Mapped[int] = mapped_column(primary_key=True)
upc: Mapped[int] = mapped_column(BigInteger, nullable=False)
provider: Mapped[int] = mapped_column(ForeignKey("providers.id"))
name: Mapped[str] = mapped_column(nullable=False)
package: Mapped[str] = mapped_column(unique=False)
current: Mapped[bool] = mapped_column(
comment="True if the item can be still requested to a provider, "
"False if it has been discontinued"
)
volume_unpacked: Mapped[int] = mapped_column(
comment="Volume of the item unpacked in cubic decimeters"
)
volume_packed: Mapped[int] = mapped_column(
comment="Volume of each unit item when packaged in cubic decimeters"
)
```
And this is how it's reflected in the ER diagram centered on the `Items` table:
![comments.png](img/comments.png)
Metadata management can be implemented as a data governance policy:
1. All fields that could be ambiguous have to be annotated with a clear definition.
2. These schemas can be published in a tool that allows anyone to search those definitions, and where those data are stored.
Point number 2 is more important than it seems, and its implementation is usually called "Data Discovery". There are tools like [Amundsen](https://www.amundsen.io/) (open source) or [Collibra](https://www.collibra.com/us/en) (proprietary) that implement data catalogs that you can connect to your data sources and extract all metadata they contain, and archive it to create a searchable index similarly to what Internet search engines do. Some organizations implement some simplified metadata management, and only the fields in the data warehouse (more on this later) are annotated. In this case they tend to use tools that are specific for the database technology like [Oracle's data catalog](https://www.oracle.com/big-data/data-catalog/)
This allows you to make sure that every time the term `sku` is used it actually refers to a Stock Keeping Unit and the storage resource is using it correctly.
## Bootstrapping the database
The first step is to create a new database in an existing postgresql database server with:
```bash
createdb -h host.docker.internal -U postgres retail
```
If you're using an ATLAS Core instance you may want to use a different database name.
The package includes a set of convenience scripts to create the tables that support the digital twin that can be accessed with the `retailtwin` command once the package has been installed. The `init` command will persist the schemas on the database.
```bash
retailtwin init postgresql://postgres:postgres@host.docker.internal/retail
```
And the `bootstrap` command will fill the database with some dummy data
```bash
retailtwin bootstrap postgresql://postgres:postgres@host.docker.internal/retail
```
After running these two commands a stocked chain of grocery stores will be available:
```bash
psql -h host.docker.internal -U postgres -c "select * from customers limit 10" retail
```
!!! success "Output"
```
id | document | info
----+----------+-------------------------------------
1 | 59502033 | {"name": "Nicholas William James"}
2 | 32024229 | {"name": "Edward Jeffrey Roth"}
3 | 40812760 | {"name": "Teresa Jason Mcgee"}
4 | 52305886 | {"name": "Emily Jennifer Lopez"}
5 | 92176879 | {"name": "Joseph Leslie Torres"}
6 | 60956977 | {"name": "Brandon Carmen Leonard"}
7 | 04707863 | {"name": "Richard Kathleen Torres"}
8 | 74587935 | {"name": "Emily Anne Pugh"}
9 | 78857405 | {"name": "James Rachel Rodriguez"}
10 | 80980264 | {"name": "Paige Kiara Chavez"}
```
## Normalized data, functions and procedures
If data models are normalized many tables will include many references to other models. These are some of the contents of the model `Itemsonshelf` that contains the items that are available in one location in particular, and their quanity.
!!! success "Output (truncated)"
```
id |batch|discount|quantity|location|
---+-----+--------+--------+--------+
1| 1| | 31| 1|
2| 2| | 31| 1|
3| 3| | 31| 1|
4| 4| | 31| 1|
5| 5| | 31| 1|
6| 6| | 31| 1|
7| 7| 4| 31| 1|
8| 8| | 31| 1|
9| 9| 7| 31| 1|
```
This table only contains foreign keys and quantities are provided on a per-batch basis. Obtainig very simple metrics, like some location's current stock of an item in particular, requires joining multiple tables. This is why databases tend to bundle data and logic. Modern Database Management Systems (DBMS) are programmable and users can define functions and procedures to simplify queries and automate data operations. The query that gets the stock of an item in a given location can be expressed as a function in SQL as:
```sql
{!../src/retailtwin/sql/stock_on_location.sql!}
```
Functions can be called both as values and as tables, since functions may return one or multiple records:
```sql
select * from stock_on_location(1,1)
```
is equivalent to
```sql
select stock_on_location(1,1)
```
and both calls return the same result
!!! success "Output"
```
stock_on_location|
-----------------+
31|
```
This package contains a set of functions, procedures, triggers, and other helpers that can be recorded into the database with
```bash
retailtwin sync postgresql://postgres:postgres@host.docker.internal/retail
```

View file

@ -1,4 +0,0 @@
# The Data Lake
So far, this case study has only covered how corporations deal with structured data, but data comes in many shapes and forms. An invoice in a PDF file is data too, and

View file

@ -1,29 +0,0 @@
# The Data Warehouse
## What is a data warehouse?
A Data Warehouse (DW) is a database or a set of databases that:
1. Implement a resource as close as possible to a *single source of truth* for the entire corporation.
2. Provide relevant aggregated metrics and Key Parameter Indicators (KPI).
3. Store historical records of relevant metrics to assess the performance of the corporation.
DW are a hubs of data. On the input side, data is periodically fetched from all transactional systems by a set of batch processes. These processes don't just copy the data from transactional systems verbatim, they will execute a set of transformations and aggregations to make the final outcome easier to work with, and generate the KPI that are relevant to high-level analysis.
On the output side, DW provide a unique and aggregated vision of the corporation that is used across the board. DW are not critical to keep the operations up and running, but they are key to assess and improve performance across the entire corporation. DW are also leveraged to implement a wealth of use cases to generate more value for customers and stakeholders like:
* Supply chain management and control.
* Campaign performance analytics.
* Executive dashboards.
* Churn and upsell scoring.
* Demand forecasts.
DW tend to be implemented with analytical databases because data is recorded in batches, and queries are mostly aggregations of sets of historical records. Depending on the size of the corporation and the number of data sources, DW are pretty large and expensive to build and to maintain. Data warehouses may contain thousands of tables with thousands of batch processes fetching and transforming data to its final shape.
## Extract, Transform, and Load (ETL)
## Data Governance: Lineage
It's common to use the data warehouse as a scratch space for users doing analytics.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 223 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 165 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 276 KiB

View file

@ -1,3 +1,3 @@
# Navigating IT
# Transactional databases and core applications
This document is an essay that supports a course

View file

@ -1,117 +0,0 @@
# People in the data ecosystem
The human dimension plays an important role in any system. Any individual has motivations and fears that have to be taken into account when planning an implementation.
In some organizations there's some degree of antagonism between the functional areas and IT. While the former mostly try to move the business forward implementing new or improved processes, and generating value, the latter is more conservative because there's a wider set of goals to achieve apart from supporting business objectives:
1. Enhancing Security and Compliance. That service may take a couple more weeks to get online because there's a penetration test to run.
2. Managing their own budget. That Spark cluster the data team requires is way over the year's budget so there will be some more waiting ahead for the project's kick-off.
3. Providing End-User Support. Let's restore a backup for the junior developer that accidentally deleted all the tables of the development database.
4. Managing IT Operations. This Saturday one of the senior engineers will truncate some tables and rebuild the indices in the central Oracle database because the average measured transaction latency went past the threshold last week.
7. Developing IT Talent. It seems GenAI is a thing now, let's give the Architecture team a couple weeks to understand how we can run LLM in our production systems without bankrupting the whole company.
8. Building IT Partnerships. We're at 100% capacity just keeping the ligths on, maybe we need to outsource network management for business users so let's call zscaler.
9. Ensuring Business Continuity. Let's make sure that every single server, laptop, cell phone, and database within the company is backed up at least once a week.
On some corporations the IT team is also responsible, not only of the operation of in-house applications and automations, but their design and implementation too. When that is the case, the IT department is the most relevant transversal team within the company.
The top layers of the IT organization will probably be business focused, and their incentives will be aligned with other functional areas. However, all staff below IT directors will not be motivated by the financial success of the company. You may have the buy-in from the CTO, while facing major setbacks with the director of IT operations, and this requires some additional thinking.
## Value Vs Risk
In essence, *functional areas try to maximize value, while the IT department tries to minimize risks*. The IT managers know that they will be able to capitalize a limited share of each success, but they will be made responsible for incidents. Any plan that may incur into additional risks will be scrutinized the sooner or later because there's no way a new digital product or initiative gets implemented without the involvement of IT.
This is the framework I use to bring some structure to this tension:
1. Ideas are about results, not about implementations
2. Value lies in without results
3. Risks are inherent to implementation
4. There are no results without implementation
5. There is no value without risks
Project execution improves where you think about engineers and managers in the IT department as someone who can *reality check your ideas*. If you're a consultant that mostly works on the "ideas" side of the equation, they will force you to think about an implementation, and then they will challenge it. Their goal is to make the final result better and be helpful providing a new point of view. If you think of them as a blocker to get from idea to results and ultimately value, you're following the wrong approach. Your goal is to have foundational knowledge to understand their language, digest their ideas, improve the final result, and maximize the value.
This text provides the tools you need to engage with discussions with members of the IT team. When a data owner says that the KPI you need is not in the data warehouse, but on a transactional system, and there's no ETL that currently in place for that, so you have to create a ticket to the data engineering team and join the discussion on the next sprint planning... I can assure that person is more than willing to help you, laying out exactly what you need, and your necessary next steps. The answer could have been that the transactional system is legacy, it can't sustain more load, and they're not planning to extract any more data from it. Again, that person is not trying to sabotage your project, it's trying to prevent a major meltdown within the company.
!!! note
Are only results valuable, or is the implementation valuable too? This is almost a philosophical question. My take is that you can perfectly argue that implementations have inherent value, but it's very hard to measure. Once you follow the "implementations have value" way you can't be picky about which bits of the implementation are more valuable than others unless you explicitly measure that. I once had a discussion with a member of the Lighthouse (an internal BCG product that provides a self-service interface for third party data) team during an internal meeting. I was representing the Atlas team, which builds platform components and development ecosystems. He argued that platform has no value, since it's just an enabler that's far from the results. I immediately responded that data is an enabler too: if platform has no value, neither has data. Data are closer to the results, but they're are not results either.
## Implementation, ownership, and budget.
Risk management involves
* Responsibility
* Alerts
* Minimizing impact
It's very common to forget about IT costs when estimating the value of a case.
It's very common to ignore about information security, and constraints with personal information, when pushing for value.
IT may come with a multi-million euro request to run your project. This is why many corporations are reluctant to build capabilities that run on razor-thin margins. Plans that gent implemented need an early estimation of.
The most pressing risk is the one of oversimplification. Business leaders tend to focus on what's possible, and underestimate the difficulties hidden on the implementation details. I call this the Elon Musk Syndrome.
!!! note
Elon Musk has systematically made claims to customers and investors that could not meet afterwards. From systematically failing to predict the number of Tesla cars built each year, to the development of [Optimus](https://en.wikipedia.org/wiki/Optimus_(robot)). Fortunately for Musk, he's not running a traditional corporation, that would get systematically take hit on their sales for these unmet claims, but a religion.
## Teaming with IT
Fully functional BCG case teams will staff engineers from X and architects from PLA that will deal with all the IT-related details, but sometimes teams are understaffed, or the budget is insuffcient to allocate the necessary profiles. Sometimes a core senior associate has to lead the relationship with IT.
* Involve them as soon as possible
* Let them contribute
* Give them details about the goals, and how you plan to achieve them
* Help them capitalize the win
The most usual reason why IT may not be willing to help is because they don't have the capacity to run more things. If they're just delaying things, or not helping you figure out which is the best way to go from idea to implementation, probably it's because they're not able to fit your needs in their planning. This is why working with them from the beginning of the project is relevant. You may team with them to:
1. Get more ownership from functional areas.
2. Help them plan the budget.
3. Echo their staffing needs. In the end the goal of every department is to grow in size and importance.
4. Praise their work
## Antipatterns
One consequence of the tension between value and ownership is the existence of [Palantir](https://www.palantir.com/). The value proposion of Palantir is that's impossible to capture value from data analytics and applications when the infrastructure is run by IT departments. According to their pitch, it just takes too long to get anything done, and they can't be trusted to build a valid development ecosystem suitable for advanced analytics. The only way around that issue is to create a set of data integrations to push data to the cloud, and pay a monthly subscription for their hosted analytics solution.
This just trades a short-term issue into a long term risk: vendor lock.
It also neglects how interwouned IT is with the rest of the organization. Assume that the CFO needs to execute aggressive cost cuts, takes a look at the different providers, and sees €1.5M/year for Palantir in the Marketing division's budget. But Palantir is a platform for data! How come the director of IT never signed off that spending, or at least was asked to provide an alternative? And while other departments are laying off personnel, and shutting down non-critical applications, a significant portion of the Marketing funds are spent in the most aggressive vendor-locked asset that there is.
Thinking about the entire value chain, and the complete timeline, including short and long term consequences, is something that the management consultants (that mostly deal with strategy) should know what to do.
## With cloud, IT may be split.
In some organizations the IT assets may be split. Historically, IT assets used to be hosted as part of the organization's own infrastructure, and some of the budget was spent purchasing servers and storage cabinets.
Cloud has raised the level to access.
## Dysfunctional IT departments
There are many reasons why an IT department becames dysfunctional, but the main one comes from the tendency of turning IT into a kitchen sink of ownership. Healthy organizations try to align responsabilities with incentives, but it's relatively uncommon. Assume a b2b business that is growing fast, and the Marketing division needs a new CRM to be able to handle all the new customers. In a healthy organization IT will be involved in the planning stage as part of a joint team with all the stakeholders. Involving IT may have unintended consequences, like discarding the Marketing team's preferred software because the technological stack is imcompatible with the current and future IT strategy. In an unhealthy organization, the CMO will tell the CTO that software X must available, and the request will be waterfalled to the IT department that will need to figre out how much it costs, and find a way to integrate it, operate it, evolve it...
If this situaition extends for a long time, the IT department isolates as a mechanism of self-defense, and gravitates towards team members with a strong tribal culture. Since IT is a team that only deals with ownership, risks, and responsbilities, then they will make their work more opaque to others and decide what to do, and the best way to do it. In the long term isolation turns into some sort of hostility.
!!! tip
There's a rule of thumb to estimate the friendliness of an IT department. Friendliness tends to be proportional to the presence of women.
If none of theat works, call the experts. There are many seasoned engineers at BCG who can help you execute in the harshest environments, even with hostile IT departments. Sometimes the final decision is to go nuclear and let BCG host the solution while requesting the minimal help from IT. This makes the project significantly more expensive, but it's often the only way to reach the goals that were agreed with the client.
## People
Each organization implements their IT capabilities in a slightly different way
* CTO. (Chief Transformation Officer, Chief Transformation and Product Officer...) Value based, process centered, not really into the technical details. May be in charge of IT (some companies don't have a CIO).
* CIO. Oversees everything related with IT, including systems, procurement, strategy... May report to CTO.
* CISO. Also takes care of information security (devices, network, policies, access)... May report to CIO or CTO.
* Director of Systems. Owns the IT's budget.
* Director of Architecture / IT. Owns new assets.
* Director of Innovation. Owns new applications. May own part of budget.
* Director of Data / Engineering. Owns data health and availablity.
* Director of Data Science / Analytics. Owns data insights.
* Director of *division*.
* Cloud architects

View file

@ -1,30 +0,0 @@
# Pragmatism
It's the norm in enterprise IT divisions to disregard "modern" engineering tools and techniques:
1. The use of open-source DBMS tends to be marginal, and Oracle still dominates the market.
2. Automation is managed by old-school enterprise management tools like Control-M, or
3. No-code solution will be preferred, and Devops and code-first tools sound much like a startup-led fad.
4. Tools that are good enough are often abused. If you can do X with SAP, you will do it.
5. Business automation is still mostly developed in SQL, PL/SQL, and Java.
6. There's a real resistence to change, and some core systems written in COBOL in the nineties are still alive and kicking.
7. There are strong barriers between "critical" systems (transactional systems, core automation) and "non-critcal" components (Data warehouse, CRM).
8. Things that work are not touched unless strictly necessary, and new features are discussed at length, planned and executed carefully, and thoroughly tested.
9. *Benign neglect* is the norm. If a new reality forces a change, sometimes it's preferable to adapt reality with wrappers, translation layers, interfaces...
10. Complexity coming from technical debt is seen as a trade off between risks. Complexity is a manageable risk, while reimplementation is a very hard-to-manage risk.
It's common to see software not as an asset, but as a liability. It's something you have to do to run your company and create profits, and you only remember about its existence when it breaks or when a team of expensive consultants suggests a feature that can't be implemented. Companies that follow this line of thought end up considering software mostly as a source of risk, and it's very hard to put a value tag at a risk. You seldom see teams of risk experts estimating that, unless a major rewrite of a particular component takes place, probability of total meltdown will be around 50% five years down the road.
In addition, the effect of aging and unfit systems and software is commonly too slow to notice or too fast to react to. Sometimes current systems are capable of keeping the operations alive with no major incidences but implementing new features is too hard. Competitors may not face these limitations and market share will be lost one customer at a time. Maybe these aging automations just melt down when an event never seen before, like COVID, the emergence of generative AI, or a huge snowstorm caused by climate change can't be handled and a relevant portion of the customer base leaves and never comes back.
!!! example
Southwest's scheduling meltdown in 2022 was so severe that it has its own [entry on Wikipedia](https://en.wikipedia.org/wiki/2022_Southwest_Airlines_scheduling_crisis). The outdated software running on outdated hardware caused all kinds of disasters, like not being able to track where crews were, and their availability. The consequence were more than fifteen thousand flights cancelled in ten days. Razor-thin operation margins were continously blamed, but Southwest has historically paid [significant dividends](https://www.nasdaq.com/market-activity/stocks/luv/dividend-history) to shareholders, and EBITDA was higher than 4B/year from 2015 to 2019. Southwest announced their intention to allocate $1.3b to upgrade their systems which, considering that the investment will probably be spreaded across a decade, it's not a particularly ambitious plan. Southwest had strong reasons to update their software and their systems but they never considered a priority until it was too late.
Pragmatism dominates decision making in enterprise IT, and touching things as little as possible in the most simple way tends to be the norm. You've probably heard that *when all you have is a hammer, everything looks like a nail*; but many things work like a nail if you hit them with a hammer hard enough. There's sometimes an enormous distance between the ideal solution and something that *Just Works™*, and pragmatic engineers tend to chose the latter. The only thing you need is a hammer, and knowing what can work as a nail. But if you keep hitting things with a hammer as hard as you can you start cracks that may induce a major structural failure under stress fatigue.
This is why startups tend to disrupt markets more than well-established corporations. It's easier to put a price tag to features, predict the impact of disruptions, and simulate meltdown scenarios when you're starting with a blank sheet of paper.
You may love or hate the ideas of N. N. Taleb, but I think it's interesting to bring the concepts of fragility and antifragility into play. You create fragility by constantly taking suboptimal but pragmatic choices, which create scar tissue all across the company. With a cramped body you can't try new things out, you just make it to the next day. Antifragile systems can have major components redesigned and rearchitected because each piece is robust enough to withstand the addititional stress of constant change.
In the following chapters the implementation of a digital twin of a retail corporation that operates a chain of grocery stores will be described. The implementation is of course limited but comprehensive enough to get some key insights about why corporate IT looks the way it looks. If anyone intends to create antifragile data systems it's important to study how fragile systems come to be on the first place.

View file

@ -1,134 +0,0 @@
# Terminals to access data
Terminals are dedicated applications or devices to interact with data. This is a very wide definition, a terminal can be an actual device like a point of sale, or a web application that shows the current stock for each location to a store manager.
## Command-line interface terminals
This case study includes some simple terminals with a command-line interface (CLI) that are installed when installing the package:
1. `pos`: a point of sale.
2. `tasks`: a stocker terminal to assist operations at the store.
2. `stock`: a stock management terminal.
Let's start a session as store manager with the last of the listed terminals with
```bash
stock postgresql://postgres:postgres@host.docker.internal/retail 1
```
The first argument to the `stock` command is the connection to the database, and the second is the ID of the location. The terminal greets us with the following message and a prompt:
```
Fetching data...
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Retail twin stock management CLI ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
This is a simple terminal to manage stock. Enter a single-letter command followed by . The available
commands are:
• l: Lists all the current items stocked in any location
• s: Enters search mode. Search an item by name
• q: Store query mode. Queries the stock of an item by UPC in the current location
• w: Warehouse query mode. Queries the stock of an item by UPC in all warehouses. Requires connection to the database
• c: Cancel mode. Retires a batch giving a UPC. Requires connection to the database
• b: Batch mode. Requests a given quantity from an item to the warehouse. Requires connection to the database
• r: Refresh data from the stock database
• h: Print this help message
• x: Exit this terminal
#>
```
This terminal, like any other terminal, provides a set of commands that interact with data in a a limited set of ways. The command `s` allows us to enter a keyword, and the terminal will return a set of related items:
```
#> s
s> coffee
Items
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ upc ┃ name ┃ package ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ 566807566244 │ Instant coffee │ 200g │
│ 212350582030 │ Coffee Beans │ 1lb bag │
│ 415996582616 │ Island Blend Coffee │ 1lb bag │
│ 167369617163 │ Ground Coffee │ 1 lb │
│ 982157811808 │ Coffee Beans │ 250g │
│ 86856869931 │ Ground coffee │ 12 oz bag │
│ 520101823089 │ French Roast Coffee │ 250 gram pack │
│ 240563892573 │ Fresh coffee beans │ 500g package │
│ 389837389865 │ Instant coffee │ 200g jar │
│ 940827785911 │ Pumpkin Spice Coffee │ 12 oz Bag │
│ 375920191429 │ Premium black coffee beans │ 500 grams │
│ 926526200297 │ Dark Roast Coffee │ 500 grams │
└──────────────┴────────────────────────────┴───────────────┘
```
The query mode searches the stock of an item in particular in the current location:
```
#> q
q> 566807566244
Item 566807566244 on location 1
┏━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ upc ┃ batch ┃ name ┃ package ┃ received ┃ best_until ┃ quantity ┃
┡━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ 566807566244 │ 20 │ Instant coffee │ 200g │ 2023-08-16 17:23:08.224154 │ 2023-09-15 17:23:08.224154 │ 193 │
└──────────────┴───────┴────────────────┴─────────┴────────────────────────────┴────────────────────────────┴──────────┘
```
It's frequent to assume that the most usual way to interact with enterprise data nawadays is with modern web-based interfaces. But there many old-school terminals still around. Point of Sales terminals tend to be very basic as well, with displays only capable of showing a handful of characters, and a button for each command. Being this a digital twin, with complete freedom to implement anything we want, building a command-line terminal is also a way of making a point.
## If it's smart it's vulnerable.
The most important constraint when designing enterprise data systems is information security, and the dumber the terminal, the more secure it is. PoS tend to be dumb because there's money inside. One key concept in information security is the *attack surface* of a system. A console with no graphical interface and a handful of commands connected to a database is inherently more secure than a web interface that needs a browser, a http connection, a web server, and a database. I can't recommend enough the book [If it's smart it's vulnerable](https://www.ifitssmartitsvulnerable.com/) by the veteran information security researcher Mikko Hypponen. Maybe that $200 cloud-connected PoS with a fancy screen from Alibaba is the door someone exploits to start a ransomware attack, or that simple web terminal that the cheapest bidder implemented is vulnerable to SQL injection.
![https://xkcd.com/327/](img/injection.png)
[From XKCD](https://xkcd.com/327/)
## If it's complex is expensive.
Mankind has been spoiled by intuitive and ergonomic user interfaces since the iPhone appeared, but mankind also landed on the moon using using a computer with the most spartan user interface ever.
!!! example "Control panel of the Apollo Guidance Computer"
![Apollo.jpg](https://upload.wikimedia.org/wikipedia/commons/b/bd/Apollo_display_and_keyboard_unit_%28DSKY%29_used_on_F-8_DFBW_DVIDS683588.jpg)
Here's a [working simulator](https://svtsim.com/moonjs/agc.html) where you can follow the full launch sequence that the crew of the Apollo spaceship had to introduce on the computer.
CLI Terminals are robust, run everywhere, and require almost no support from the operative system. Here's the Windows command prompt running the `stock` terminal application.
![terminal.png](img/terminal.png)
There's a 99% chance that the future Windows version released in 2033 is still able to run this application. That may not be valid for a web-based application developed with today's technologies. The most popular browser technology in corporate clients ten years ago was still Internet Explorer, and web applications had to implement support for it.
!!! example
The Airbus A320 civil aircraft was developed in the eighties. The Multipurpose Control and Display Unit (MCDU) is a panel that the flight crew use to interact with the onboard computer. Together with the autopilot, it's one of the components of the aircrafts that the crew spends most time interacting with.
![mcdu.png](img/A320cockpit.png)
It took more than 20 years to move from a simple keyboard and a 5-inch screen to a trackball and keyboard when the A380 was developed. The most modern aircraft by Airbus, the A350, features the Keyboard and Cursor Control Unit (KCCU) with a QWERTY keyboard and a pointer that they can move around the panels in the cockpit. It's more modern, intuitive, enjoyable, and less error-prone.
New Airbus A320 still get a MCDU. There's very little motivation to upgrade a design that works: the Airbus A320 is the [highest-selling airliner](https://en.wikipedia.org/wiki/Airbus_A320_family#:~:text=As%20of%20August%202023%2C%20a,since%20its%20entry%20into%20service.), and there are tens of thousands of crews that already know how to use MCCU. In addition, retrofitting a KCCU into the A320 design may cost to Airbus almost as much as designing a new plane from scratch.
## API-based web applications
Some terminals, like PoS, run on specific hardware with a dedicated display and user interface. CLI terminals' display is the operative system's console. Web applications' display is a browser, which is today almost as capable as an operative system. The entire Microsoft Office suite can now run on a browser.
Web applications require the following components.
* A database. Data has to be stored somewhere.
* A web server that runs the business logic and interfaces data with presentation.
* An application, or presentation logic, that runs on the browser.
The code running on the database and the webserver is fequently called *backend* while the code running on the broser is frequently called *frontend*. It's obvious that implementing a terminal as a web application will require significantly more effort to develop, deploy and to secure that a CLI terminal.
![webterminal.png](img/webterminal.png)
The previous image is a web application that implements an analogous set of functionalities as the previos CLI terminal. It can list the items that are available in each location, retire batches, list the available stock in terminals, search products... The implementation took roughly *ten times longer* than the CLI terminal, but it's clearly more powerful, feature-rich and easier to use.
Web development is a field in constant change, and this makes technological choices harder, and more relevant. New frameworks and libraries are published every year, and the ecosystem is so fragmented that a software engineer will be fluent in a handful of those technologies in a landscape of hundreds of competing technologies.
The design and implementation of effective user interfaces is as important as hard and time consuming. Don't assume users can be trained to use any user interface that *makes sense*. There should be constant usability tests to gather feedback from users to tune user experience. In the end, the web frontend is the only component of a very large ecosystem that the final user sees. Web application face the risk of not being successful because of a bad user experience. User interfaces are also important to prevent users to enter some wrong input and cause operational issues.

View file

@ -1,22 +1,12 @@
site_name: Navigating IT
site_name: Transactional databases and core applications
site_author: Guillem Borrell PhD
site_url: https://github.com/bcgx-gp-atlasCommunity/retailtwin
copyright: © Guillem Borrell Nogueras, The Boston Consulting Group. All rights reserved.
repo_url: https://github.com/bcgx-gp-atlasCommunity/retailtwin
site_url: https://git.guillemborrell.es/guillem/PyConES24
copyright: © Guillem Borrell Nogueras
repo_url: https://git.guillemborrell.es/guillem/PyConES24
edit_uri: edit/main/docs/
nav:
- "Introduction": "index.md"
- "People in the data ecosystem": "organization.md"
- "Pragmatism in Enterprise Software": "pragmatism.md"
- "How data is stored": "data.md"
- "Terminals to access data": "terminals.md"
- "How automation is implemented": "automation.md"
- "The Data Warehouse": "dw.md"
- "The Data Lake": "dl.md"
- "API": "api.md"
- "Applications and value": "applications.md"
- "Buy vs Build": "buyvsbuild.md"
theme:
name: "material"
palette:
@ -51,8 +41,6 @@ theme:
plugins:
- search
- with-pdf:
cover_subtitle: A course for core consultants and data scientists.
markdown_extensions:
- md_in_html