People who buy umbrellas also buy raincoats. AI is very good at finding this pattern. Train it on millions of purchase records and it can recommend raincoats to umbrella buyers with high accuracy.

The problem is that the model doesn't know why it rains.

It rains, so people buy umbrellas. It rains, so people buy raincoats. The umbrella doesn't cause the raincoat purchase. Rain does. But most AI doesn't make this distinction. It can't tell the difference between "things that happen together" and "one thing causing another." Correlation versus causation.

In an online store, this is a minor missed-revenue problem. But what if this is a hospital? A corporate decision? A government policy?


Here's an example.

A hospital has a high outpatient no-show rate. The data shows that the majority of no-show patients waited more than two weeks after booking. The correlation is clear: longer wait times, higher no-show rates.

Most predictive models will tell you: "Patients with wait times over 14 days have a 38% no-show probability." That's accurate. But what can you actually do with this information?

You'd want to conclude that "reducing wait times will reduce no-shows." But that's a causal claim. Whether the wait time itself causes no-shows — or whether patients with long wait times simply have characteristics that make them more likely to no-show (mild conditions, transportation barriers, cost concerns) — this data alone can't tell you.

If it's the latter, adding staff to reduce wait times won't meaningfully change the no-show rate. It just increases costs. A decision worth tens of thousands of dollars goes wrong, starting from a single line of analysis that confused correlation with causation.

This is not an edge case. Nearly every organization's data-driven decision-making has the same structural problem.


Prediction vs. causal inference.

What predictive models tell you is "what is likely to happen." This is useful information, but it's different from what decision-makers actually need. What decision-makers need is: "if I take this action, what changes?"

The first is prediction. The second is causal inference. They are entirely different questions requiring entirely different methodologies.

Prediction finds patterns. When these conditions existed in the past, this outcome followed, so similar conditions should produce similar outcomes. Causal inference finds mechanisms. If I change A, does B change? If so, by how much, and under what conditions?

What has exploded in AI over the last decade is almost entirely on the prediction side. Large language models, image recognition, recommendation systems — all pattern matching based on correlations. Remarkably sophisticated, but still unable to answer "why."


"We already do A/B tests."

A common objection: "We already do A/B tests." True — A/B testing is the gold standard for establishing causation. Random assignment lets you verify whether a change actually works.

The problem is that in reality, you can't A/B test most of the decisions that matter.

Can you experiment with changing a hospital's staffing model? Can you randomly assign patient safety decisions? Can a government implement a policy halfway to compare results? Can a company run two versions of a core operation simultaneously?

In most cases, the answer is no. The cost is too high, it's ethically impossible, or it's physically impractical. The most important decisions are made without experiments.

This is exactly what causal inference methods are designed to address. Statistical methods that can estimate causal effects from observational data — without running an experiment — already exist. They've been developed over decades in economics and epidemiology. Difference-in-differences, regression discontinuity, instrumental variables, synthetic control — these names may be unfamiliar, but they've produced multiple Nobel Prizes in Economics.

The problem is that these methods remain trapped in academic papers and PhD-level consulting. Hospital operations teams don't run difference-in-differences analyses. CEOs don't set up instrumental variables. Not because the tools don't exist, but because they haven't reached the front lines.


Why this matters now.

Why does this matter now? Because as AI becomes more ubiquitous, the scale of decisions made on correlation-based analysis is growing.

Ten years ago, flawed data analysis led to misallocated marketing budgets. Today, AI recommends staffing levels, determines loan approvals, and allocates medical resources. The downstream impact of flawed analysis has grown exponentially.

And in most cases, no one distinguishes whether the analysis is correlational or causal. A dashboard shows a pattern, someone makes a decision based on that pattern, and when the outcome doesn't match expectations, the conclusion is "we needed more data" or "we need to train the model more." The real problem isn't the volume of data — it's that the wrong type of question was asked.

This is exactly what I observed repeatedly during my research at Harvard. Using Medicare data to analyze how hospital market structure affects quality of care, I found cases where simple correlations and causal effects pointed in opposite directions. Vertically integrated hospitals appeared to perform better, but when causal methods were applied, it wasn't that integration improved performance — it was that already high-performing hospitals chose to integrate. The policy implications are completely different.

Looking at correlation alone: "encourage integration." Looking at causation: "integration may not be the answer." Same data, opposite conclusions.


The first step.

The solution isn't simple. But the first step is clear: recognizing that prediction and causal inference are different questions.

When you look at a dashboard and ask "why," you need to verify whether the answer is actually answering "why." Two metrics moving together doesn't mean one causes the other. A predictive model identifying risk factors doesn't mean removing those factors will change the outcome.

The more an organization invests in data, the more this distinction matters. Data is a tool. But the tool for finding patterns and the tool for finding causes are different. You need both, but you need to know which one you're using.

There are moments when prediction is what you need, and moments when causal inference is what you need. If you don't know the difference, more data doesn't lead to better decisions. It leads to more confident decisions that are still wrong.