Why Analytics Fail on March Madness Upsets

Twelve upsets per year, on average. That is what the NCAA Tournament first round delivers, give or take, and every analyst with a KenPom subscription and a Kaggle account will confidently tell you they saw it coming. What none of them will tell you is which twelve. That gap—between knowing chaos is coming and knowing where it lands—is not a modeling failure. It is the most honest thing analytics has ever admitted about itself.

Every March, a new wave of data scientists publishes their bracket predictions with enough confidence to make you feel foolish for picking differently. We are dealing with probabilities, and no machine learning model, no matter how advanced, can fully capture the randomness inherent to March Madness. That is not a caveat buried in a footnote. That is the entire story. And yet, here is what the analytics community still gets wrong: confusing a structural problem with a data problem. You cannot collect your way out of single-elimination variance. You can only understand it better.

The Possession Problem Nobody Wants to Solve

Here is the number nobody is talking about: a college basketball team gets roughly 60 to 80 possessions per game. That is it. That is the entire sample. KenPom's adjusted efficiency margin, or AdjEM, measures how many points a team would outscore an average opponent per 100 possessions—a brilliant metric for evaluating true team quality over a full season. Over 35 games, it is extremely informative. Over one neutral-court tournament game with 68 possessions, it is a very good prior getting absolutely obliterated by variance.

The single-elimination nature of the tournament creates a lot of variance from the small sample size of games, and with only one game deciding who advances each round, a few bad bounces of the ball can drastically alter the outcome in ways models cannot predict. This is not a surprise. It is math. A faster tempo compounds the problem. Higher tempo leads to greater variance in performance, making teams more prone to high-risk, high-reward scenarios that result in either blowout wins or shocking upsets. When the underdog plays at a pace that reduces the favorite's margin for error, the AdjEM gap compresses in real time. The model knows this, by the way. It just cannot tell you which game.

The research backs this up in uncomfortable specificity. A Furman University model combining 18 different variables and tested on tournament data from 2007 through 2021 achieved 76% accuracy in games where the seed gap was at least five. Seventy-six percent. That is genuinely excellent for this kind of prediction. It also means one in four of those games will make your bracket look like a kindergartner filled it out. And that is the best-case model, built by mathematics professors who spent years on the problem.

The model says: you will be wrong. Frequently. Irreducibly. Not because your data is bad, but because one game is not enough data to let quality fully express itself. Rook will tell you a team "wanted it more" in the first round. The possession count says they shot 7-of-12 from three in a 68-possession game, which will not happen again in the next round. We can both be a little right, but only one of us is describing a repeatable process.

What Models Can Actually Do (and What We Keep Asking Them to Do Instead)

The honest analytics answer to March Madness first-round upsets is not upset prediction. It is upset probability. Those are different requests, and conflating them is how you end up disappointed every year.

No. 12 seeds have a 55-101 record against No. 5 seeds since 1985, good for a .353 winning percentage. That is the number. Not "pick the 12 seed," not "never pick the 12 seed"—pick them 35% of the time across four games and your expected value is correct. At least one No. 12 seed has won a Round of 64 game in 33 of the last 39 tournaments. So yes, one will almost certainly happen. The model absolutely cannot tell you which one, and neither can the eye test, and neither can the person on television who watched three of those four teams play this season.

Where analytics genuinely adds value is not in first-round upset spotting. To win your pools, you do not have to get every upset correct—what is more important is picking the correct teams to get deep in the tournament. The KenPom efficiency metrics are remarkably predictive of championship-caliber teams. Since 1999, 18 of the tournament's 25 champions have been a No. 1 seed, and since 2002, all but two champions have been a top 20 team in both adjusted offensive and defensive rating according to KenPom. That is where your model's edge lives. Not in calling which 13 seed beats which 4 seed on a Thursday in March, but in knowing that the champion almost certainly has an elite adjusted defensive efficiency and you should protect that pick accordingly.

The analytics community keeps underselling this because "protect your Final Four picks" is less exciting than "I predicted the Furman upset." That is a marketing problem dressed up as an epistemological one. Historically, people are about 70% correct in bracket predictions, making the odds of a perfect bracket 1 in 5.7 billion. Improving to 71% accuracy drops those odds to 1 in 2.3 billion. The math of perfection is not the math of winning your pool.

This is a math problem, not a vibes problem. The model is not broken because it cannot predict every 13-over-4 upset. It is broken only if you ask it to do something it was never designed to do. What it can tell you—reliably, repeatedly, without drama—is which teams are structurally built to survive a tournament. My prediction: the 2026 champion will rank top 15 in KenPom's adjusted defensive efficiency. Put that on the board. When I am wrong, I will update the model. That is how this works.