Planning towards the course.

This commit is contained in:
Guillem Borrell 2023-11-18 09:28:18 +01:00
parent 763141fccc
commit 125cb67adf
8 changed files with 127 additions and 32 deletions

3
docs/api.md Normal file
View file

@ -0,0 +1,3 @@
# API
API stands for Application Programming Interface.

7
docs/applications.md Normal file
View file

@ -0,0 +1,7 @@
# Applications and value
!!! example
While in a pitch, one of the clients said that she didn't want the call center agents to look at multiple screens. If we developed anything we had to integrate it into their current tool. That meant that we couldn't develop exactly what we thougth could bring more value, but we had to study what could be integrated with the current tool instead. I really wanted to challenge that. Her hypothesis was that more complexity made call center agents less productive, but I've seen examples of the opposite. While tuning a similar application in a previous client, I did the same challenge to the applicaiton developers. Maybe we were showing too much things in the screen. But once we put the applicaiton in front of call center agents, they asked to see even more informaition!
Assumptions have to be tested, and it's surprising to me how consultants challenge client's assumptions on how they run their business, but we decide not to contewst important choices about usability and design. That client was more than willing to sacrifice tons of functionality, and maybe millions, due to an hypothesis that could perfectly be wrong.

View file

@ -1,5 +1,7 @@
# Buy vs Build
Modularization of enterprise software components. This makes running businesses more sustainable.
## Bulid vs buy an off-the-shelf solution.
Bulding enterprise data systems is hard and expensive. This is why many companies decide to purchase an ERP like SAP, which is able to handle financials, purchasing, stock, customer relationships, reporting... This seems to be the safest choice by far
@ -18,7 +20,7 @@ But appearances can be deceiving.
* Proprietary software is always bundled with some additional vendor lock, like supporting a small subset of proprietary storage systems.
* Buying software implies vendoring knowledge. If there's a critical issue with one of these components and the vendor is not able to provide support, or it bankrupts and disappears, the engineers within the corporation won't be able to fix the issue no matter how smart they are.
The most common scenario is to run a mix of applications: mission-critical operations are run by custom, in-house-developed applications maintained by the IT department, while other less critical operations like Marketing use third party tools. I've encountered many corporations that run most of their operations on custom-built software, but they decided to buy a proprietary CRM like Salesforce to support their Marketing and Sales departents. Most projects executed by IT deparments are related to data integrations between two existing tools, or between existing and some new tool the leadership decided to purchase.
The most common scenario is to run a mix of applications: mission-critical operations are run by custom, in-house-developed applications maintained by the IT department or functional areas, while other less critical operations like Marketing use third party tools. I've encountered many corporations that run most of their operations on custom-built software, but they decided to buy a proprietary CRM like Salesforce to support their Marketing and Sales departents. Most projects executed by IT deparments are related to data integrations between two existing tools, or between existing and some new tool the leadership decided to purchase.
## Bespoke software and the false buy vs build dichotomy.

View file

@ -1,30 +1,3 @@
# Introduction
# Navigating IT
This project explores why enterprise environments tend to disregard "modern" engineering tools and techniques:
1. The use of open-source DBMS tends to be marginal, and Oracle still dominates the market.
2. Automation is managed by old-school enterprise management tools like Control-M, or
3. No-code solution will be preferred, and Devops and code-first tools sound much like a startup-led fad.
4. Tools that are good enough are often abused. If you can do X with SAP, you will do it.
5. Business automation is still mostly developed in SQL, PL/SQL, and Java.
6. There's a real resistence to change, and some core systems written in COBOL in the nineties are still alive and kicking.
7. There are strong barriers between "critical" systems (transactional systems, core automation) and "non-critcal" components (Data warehouse, CRM).
8. Things that work are not touched unless strictly necessary, and new features are discussed at length, planned and executed carefully, and thoroughly tested.
9. *Benign neglect* is the norm. If a new reality forces a change, sometimes it's preferable to adapt reality with wrappers, translation layers, interfaces...
10. Complexity coming from technical debt is seen as a trade off between risks. Complexity is a manageable risk, while reimplementation is a very hard-to-manage risk.
It's common to see software not as an asset, but as a liability. It's something you have to do to run your company and create profits, and you only remember about its existence when it breaks or when a team of expensive consultants suggests a feature that can't be implemented. Companies that follow this line of thought end up considering software mostly as a source of risk, and it's very hard to put a value tag at a risk. You seldom see teams of risk experts estimating that, unless a major rewrite of a particular component takes place, probability of total meltdown will be around 50% five years down the road.
In addition, the effect of aging and unfit systems and software is commonly too slow to notice or too fast to react to. Sometimes current systems are capable of keeping the operations alive with no major incidences but implementing new features is too hard. Competitors may not face these limitations and market share will be lost one customer at a time. Maybe these aging automations just melt down when an event never seen before, like COVID, the emergence of generative AI, or a huge snowstorm caused by climate change can't be handled and a relevant portion of the customer base leaves and never comes back.
!!! example
Southwest's scheduling meltdown in 2022 was so severe that it has its own [entry on Wikipedia](https://en.wikipedia.org/wiki/2022_Southwest_Airlines_scheduling_crisis). The outdated software running on outdated hardware caused all kinds of disasters, like not being able to track where crews were, and their availability. The consequence were more than fifteen thousand flights cancelled in ten days. Razor-thin operation margins were continously blamed, but Southwest has historically paid [significant dividends](https://www.nasdaq.com/market-activity/stocks/luv/dividend-history) to shareholders, and EBITDA was higher than 4B/year from 2015 to 2019. Southwest announced their intention to allocate $1.3b to upgrade their systems which, considering that the investment will probably be spreaded across a decade, it's not a particularly ambitious plan. Southwest had strong reasons to update their software and their systems but they never considered a priority until it was too late.
Pragmatism dominates decision making in enterprise IT, and touching things as little as possible in the most simple way tends to be the norm. You've probably heard that *when all you have is a hammer, everything looks like a nail*; but many things work like a nail if you hit them with a hammer hard enough. There's sometimes an enormous distance between the ideal solution and something that *Just Works™*, and pragmatic engineers tend to chose the latter. The only thing you need is a hammer, and knowing what can work as a nail. But if you keep hitting things with a hammer as hard as you can you start cracks that may induce a major structural failure under stress fatigue.
This is why startups tend to disrupt markets more than well-established corporations. It's easier to put a price tag to features, predict the impact of disruptions, and simulate meltdown scenarios when you're starting with a blank sheet of paper.
You may love or hate the ideas of N. N. Taleb, but I think it's interesting to bring the concepts of fragility and antifragility into play. You create fragility by constantly taking suboptimal but pragmatic choices, which create scar tissue all across the company. With a cramped body you can't try new things out, you just make it to the next day. Antifragile systems can have major components redesigned and rearchitected because each piece is robust enough to withstand the addititional stress of constant change.
In the following chapters the implementation of a digital twin of a retail corporation that operates a chain of grocery stores will be described. The implementation is of course limited but comprehensive enough to get some key insights about why corporate IT looks the way it looks. If anyone intends to create antifragile data systems it's important to study how fragile systems come to be on the first place.
This document is an essay that supports a course

View file

@ -1,2 +1,79 @@
# People in the data ecosystem
The human dimension plays an important role in any system. Any individual has motivations and fears that have to be taken into account when planning an implementation.
In some organizations there's some degree of antagonism between the functional areas and IT. While the former mostly try to move the business forward implementing new or improved processes, and generating value, the latter is more conservative because there's a wider set of goals to achieve apart from supporting business objectives:
1. Enhancing Security and Compliance. That service may take a couple more weeks to get online because there's a penetration test to run.
2. Managing their own budget. That Spark cluster the data team requires is way over the year's budget so there will be some more waiting ahead for the project's kick-off.
3. Providing End-User Support. Let's restore a backup for the junior developer that accidentally deleted all the tables of the development database.
4. Managing IT Operations. This Saturday one of the senior engineers will truncate some tables and rebuild the indices in the central Oracle database because the average measured transaction latency went past the threshold last week.
7. Developing IT Talent. It seems GenAI is a thing now, let's give the Architecture team a couple weeks to understand how we can run LLM in our production systems without bankrupting the whole company.
8. Building IT Partnerships. We're at 100% capacity just keeping the ligths on, maybe we need to outsource network management for business users so let's call zscaler.
9. Ensuring Business Continuity. Let's make sure that every single server, laptop, cell phone, and database within the company is backed up at least once a week.
On some corporations the IT team is also responsible, not only of the operation of in-house applications and automations, but their design and implementation too. When that is the case, the IT department is the most relevant transversal team within the company.
The top layers of the IT organization will probably be business focused, and their incentives will be aligned with other functional areas. However, all staff below IT directors will not be motivated by the financial success of the company. You may have the buy-in from the CTO, while facing major setbacks with the director of IT operations, and this requires some additional thinking.
## Value Vs Risk
In essence, *functional areas try to maximize value, while the IT department tries to minimize risks*. The IT managers know that they will be able to capitalize a limited share of each success, but they will be made responsible for incidents. Any plan that may incur into additional risks will be scrutinized the sooner or later because there's no way a new digital product or initiative gets implemented without the involvement of IT.
This is the framework I use to bring some structure to this tension:
1. Ideas are about results, not about implementations
2. Value lies in without results
3. Risks are inherent to implementation
4. There are no results without implementation
5. There is no value without risks
Project execution improves where you think about engineers and managers in the IT department as someone who can *reality check your ideas*. If you're a consultant that mostly works on the "ideas" side of the equation, they will force you to think about an implementation, and then they will challenge it. Their goal is to make the final result better and be helpful providing a new point of view. If you think of them as a blocker to get from idea to results and ultimately value, you're following the wrong approach. Your goal is to have foundational knowledge to understand their language, digest their ideas, improve the final result, and maximize the value.
This text provides the tools you need to engage with discussions with members of the IT team. When a data owner says that the KPI you need is not in the data warehouse, but on a transactional system, and there's no ETL that currently in place for that, so you have to create a ticket to the data engineering team and join the discussion on the next sprint planning... I can assure that person is more than willing to help you, laying out exactly what you need, and your necessary next steps. The answer could have been that the transactional system is legacy, it can't sustain more load, and they're not planning to extract any more data from it. Again, that person is not trying to sabotage your project, it's trying to prevent a major meltdown within the company.
!!! note
Are only results valuable, or is the implementation valuable too? This is almost a philosophical question. My take is that you can perfectly argue that implementations have inherent value, but it's very hard to measure. Once you follow the "implementations have value" way you can't be picky about which bits of the implementation are more valuable than others unless you explicitly measure that. I once had a discussion with a member of the Lighthouse (an internal BCG product that provides a self-service interface for third party data) team during an internal meeting. I was representing the Atlas team, which builds platform components and development ecosystems. He argued that platform has no value, since it's just an enabler that's far from the results. I immediately responded that data is an enabler too: if platform has no value, neither has data. Data are closer to the results, but they're are not results either.
## Implementation, ownership, and budget.
Risk management involves
* Responsibility
* Alerts
* Minimizing impact
It's very common to forget about IT costs when estimating the value of a case.
IT may come with a multi-million euro request to run your project. This is why many corporations are reluctant to build capabilities that run on razor-thin margins. Plans that gent implemented need an early estimation of
## Teaming with IT
Fully functional BCG case teams will staff engineers from X or PLA that will deal with all the IT-related details, but sometimes teams are understaffed, or the budget is insuffcient to allocate the necessary profiles. Sometimes a core senior associate has to lead the relationship with IT.
* Involve them as soon as possible
* Let them contribute
* Give them details about the goals, and how you plan to achieve them
* Help them capitalize the win
The most usual reason why IT may not be willing to help is because they don't have the capacity to run more things. If they're just delaying things, or not helping you figure out which is the best way to go from idea to implementation, probably it's because they're not able to fit your needs in their planning. This is why working with them from the beginning of the project is relevant. You may team with them to:
1. Get more ownership from functional areas.
2. Help them plan the budget.
3. Echo their staffing needs. In the end the goal of every department is to grow in size and importance.
4. Praise their work
## Dysfunctional IT departments
There are many reasons why an IT department becames dysfunctional, but the main one comes from the tendency of turning IT into a kitchen sink of ownership. Healthy organizations try to align responsabilities with incentives, but it's relatively uncommon. Assume a b2b business that is growing fast, and the Marketing division needs a new CRM to be able to handle all the new customers. In a healthy organization IT will be involved in the planning stage as part of a joint team with all the stakeholders. Involving IT may have unintended consequences, like discarding the Marketing team's preferred software because the technological stack is imcompatible with the current and future IT strategy. In an unhealthy organization, the CMO will tell the CTO that software X must available, and the request will be waterfalled to the IT department that will need to figre out how much it costs, and find a way to integrate it, operate it, evolve it...
If this situaition extends for a long time, the IT department isolates as a mechanism of self-defense, and gravitates towards team members with a strong tribal culture. Since IT is a team that only deals with ownership, risks, and responsbilities, then they will make their work more opaque to others and decide what to do, and the best way to do it. In the long term isolation turns into some sort of hostility.
!!! tip
There's a rule of thumb to estimate the friendliness of an IT department. Friendliness tends to be proportional to the presence of women.
If none of theat works, call the experts. There are many seasoned engineers at BCG who can help you execute in the harshest environments, even with hostile IT departments. Sometimes the final decision is to go nuclear and let BCG host the solution while requesting the minimal help from IT. This makes the project significantly more expensive, but it's often the only way to reach the goals that were agreed with the client.

30
docs/pragmatism.md Normal file
View file

@ -0,0 +1,30 @@
# Pragmatism
It's the norm in enterprise IT divisions to disregard "modern" engineering tools and techniques:
1. The use of open-source DBMS tends to be marginal, and Oracle still dominates the market.
2. Automation is managed by old-school enterprise management tools like Control-M, or
3. No-code solution will be preferred, and Devops and code-first tools sound much like a startup-led fad.
4. Tools that are good enough are often abused. If you can do X with SAP, you will do it.
5. Business automation is still mostly developed in SQL, PL/SQL, and Java.
6. There's a real resistence to change, and some core systems written in COBOL in the nineties are still alive and kicking.
7. There are strong barriers between "critical" systems (transactional systems, core automation) and "non-critcal" components (Data warehouse, CRM).
8. Things that work are not touched unless strictly necessary, and new features are discussed at length, planned and executed carefully, and thoroughly tested.
9. *Benign neglect* is the norm. If a new reality forces a change, sometimes it's preferable to adapt reality with wrappers, translation layers, interfaces...
10. Complexity coming from technical debt is seen as a trade off between risks. Complexity is a manageable risk, while reimplementation is a very hard-to-manage risk.
It's common to see software not as an asset, but as a liability. It's something you have to do to run your company and create profits, and you only remember about its existence when it breaks or when a team of expensive consultants suggests a feature that can't be implemented. Companies that follow this line of thought end up considering software mostly as a source of risk, and it's very hard to put a value tag at a risk. You seldom see teams of risk experts estimating that, unless a major rewrite of a particular component takes place, probability of total meltdown will be around 50% five years down the road.
In addition, the effect of aging and unfit systems and software is commonly too slow to notice or too fast to react to. Sometimes current systems are capable of keeping the operations alive with no major incidences but implementing new features is too hard. Competitors may not face these limitations and market share will be lost one customer at a time. Maybe these aging automations just melt down when an event never seen before, like COVID, the emergence of generative AI, or a huge snowstorm caused by climate change can't be handled and a relevant portion of the customer base leaves and never comes back.
!!! example
Southwest's scheduling meltdown in 2022 was so severe that it has its own [entry on Wikipedia](https://en.wikipedia.org/wiki/2022_Southwest_Airlines_scheduling_crisis). The outdated software running on outdated hardware caused all kinds of disasters, like not being able to track where crews were, and their availability. The consequence were more than fifteen thousand flights cancelled in ten days. Razor-thin operation margins were continously blamed, but Southwest has historically paid [significant dividends](https://www.nasdaq.com/market-activity/stocks/luv/dividend-history) to shareholders, and EBITDA was higher than 4B/year from 2015 to 2019. Southwest announced their intention to allocate $1.3b to upgrade their systems which, considering that the investment will probably be spreaded across a decade, it's not a particularly ambitious plan. Southwest had strong reasons to update their software and their systems but they never considered a priority until it was too late.
Pragmatism dominates decision making in enterprise IT, and touching things as little as possible in the most simple way tends to be the norm. You've probably heard that *when all you have is a hammer, everything looks like a nail*; but many things work like a nail if you hit them with a hammer hard enough. There's sometimes an enormous distance between the ideal solution and something that *Just Works™*, and pragmatic engineers tend to chose the latter. The only thing you need is a hammer, and knowing what can work as a nail. But if you keep hitting things with a hammer as hard as you can you start cracks that may induce a major structural failure under stress fatigue.
This is why startups tend to disrupt markets more than well-established corporations. It's easier to put a price tag to features, predict the impact of disruptions, and simulate meltdown scenarios when you're starting with a blank sheet of paper.
You may love or hate the ideas of N. N. Taleb, but I think it's interesting to bring the concepts of fragility and antifragility into play. You create fragility by constantly taking suboptimal but pragmatic choices, which create scar tissue all across the company. With a cramped body you can't try new things out, you just make it to the next day. Antifragile systems can have major components redesigned and rearchitected because each piece is robust enough to withstand the addititional stress of constant change.
In the following chapters the implementation of a digital twin of a retail corporation that operates a chain of grocery stores will be described. The implementation is of course limited but comprehensive enough to get some key insights about why corporate IT looks the way it looks. If anyone intends to create antifragile data systems it's important to study how fragile systems come to be on the first place.

View file

@ -5,12 +5,15 @@ edit_uri: edit/main/docs/
nav:
- "Introduction": "index.md"
- "People in the data ecosystem": "organization.md"
- "Pragmatism in Enterprise Software": "pragmatism.md"
- "How data is stored": "data.md"
- "Terminals to access data": "terminals.md"
- "How automation is implemented": "automation.md"
- "Buy vs Build": "buyvsbuild.md"
- "The Data Warehouse": "dw.md"
- "The Data Lake": "dl.md"
- "API": "api.md"
- "Applications and value": "applications.md"
- "Buy vs Build": "buyvsbuild.md"
theme:
name: "material"
palette:

View file

@ -27,4 +27,4 @@ caddy run -c Caddyfile
You should be able to browse the application at http://127.0.0.1, and reach the api docs at http://127.0.0.1/api/v1/docs
Caddy can be installed in Windows with Chocolatey
Caddy can be installed in with Chocolatey on Windows, and with Homebrew on mac.