sports analytics 101: the case for sports analytics

October 8, 2020

Welcome to the first installment of Sports Analytics 101, a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. I’m starting this series by breaking down the why in sports analytics, shedding light on the reasons that sports analytics is useful. In future posts, I’ll dive into the how, exploring some of the ways sports analytics is implemented.

There are several general reasons that sports analytics has been able to provide value beyond traditional qualitative judgments, and most of them boil down to this fact: humans have limits. We can broadly classify these limits into mental constraints and situational constraints.

Among the mental constraints are that our minds make mistakes and take shortcuts (as you’ll see below, perhaps more shortcuts than you realize). Our eyes can only focus on one thing at a time. Our memory is limited.

As for situational constraints, perhaps the most obvious one is time. In the real world, we have limited time to watch games and absorb film. And even if we did, the mental limits would likely hinder our ability to accurately process all of the film we watched.

Analytics is useful because it helps us to navigate both mental and situational constraints.

Mental Limits: Heuristics and Cognitive Biases

If you’re reading this, you’ve probably at least heard of Moneyball by Michael Lewis. If you haven’t read the book yet, it should be near the top of your reading list. Lewis’s account of how the small-market, low-budget Oakland A’s, led by Billy Beane, used sports analytics to exploit inefficiencies and compete with better-funded teams has become, in many ways, the quintessential tale of sports analytics. In fact, the term “Moneyball” has become virtually synonymous with sports analytics. As analytics has been adopted by sports beyond baseball and industries beyond sports, it’s not uncommon to hear things like “Moneyball for hockey” or even “Moneyball for livestock.”

In Moneyball, the Oakland front office found that other teams, and by extension the humans running other teams, were overvaluing or undervaluing certain player attributes. For example, the A’s found that teams were undervaluing the ability to draw walks. Using this information, the A’s were able to identify and sign decent players that they could afford because other teams had undervalued their talent for drawing walks.

Why were teams undervaluing these attributes? As Lewis would later explore in another book, The Undoing Project(which should also be near the top of your reading list), the humans running the other teams were falling victim to common cognitive biases that, as behavioral economists had been arguing for decades, we all fall victim to. Front office executives were making similar mistakes to those made by all of us every day; they just happened to be making those mistakes when evaluating baseball players.

As Lewis explains in The Undoing Project, the two researchers best known for studying these types of cognitive biases are two Israeli psychologists, Daniel Kahneman and Amos Tversky. Kahneman and Tversky published a number of papers on the biases that impair our decision-making, and their work became the foundation for the field of behavioral economics.

Kahneman and Tversky found that when faced with situations of uncertainty, like whether or not a young prospect will become a good major league player, humans fall back on mental heuristics. Heuristics are essentially mental shortcuts. They allow us to make complex decisions quickly and, in some cases, relatively accurately. However, as you might have guessed, they can also lead us to make misjudgements.

While there are a number of heuristics that have been studied over the years, I’ll focus on two that Khaneman and Tversky originally studied in the 1970s and that continue to impact judgement in sports to this day.

Availability

The availability bias (or availability heuristic) is the tendency of humans to overestimate the frequency of particularly memorable events. For example, say you live in Boston where there aren’t many tornados. You’re planning a trip to Oklahoma, but you’re concerned that you’ll get caught in a tornado because you’ve seen news reports about tornadoes in Oklahoma.

As it turns out, tornadoes don’t happen every day in Oklahoma and are, in fact, not likely to be a threat to your trip. Your concern is derived mainly from the fact that you only remember and retain information about the weather in Oklahoma when it is particularly memorable, like when there’s a tornado. You don’t tend to remember the many times the weather was perfectly normal in Oklahoma because, well, it wasn’t that memorable. Under uncertainty about the weather in Oklahoma, you fall back on the available memories you have about the weather in Oklahoma, which tend to be the memories of tornadoes.

It’s fairly easy to see how this heuristic can apply to sports. When an athlete makes an amazing play, we’re likely to remember that play. What we don’t remember are the many times that the same player did very little to nothing because, well, doing very little to nothing isn’t very memorable. A player might go a whole game and have little impact, but we may remember one moment of brilliance and come away with the perception that the player had an outstanding game.

Of course, many fans and front office employees are savvy enough to properly weigh a player’s performance across the entire game when focused on that particular game alone. But over the course of a season, there’s a lot of information to be processed about hundreds of players in hundreds of games. You can see how we might quickly fall back on the availability heuristic when attempting to evaluate a single player among hundreds over the course of an entire season without the support of data. We simply can’t remember every play from every player. Because we lack the mental capabilities to remember each play, we fall back on the plays we can remember: the particularly memorable ones.

Data, for the most part, is not subject to availability bias. It doesn’t remember one play above others because of a spectacular catch. All plays are theoretically treated equal. That’s a key advantage of using data in decision-making: when analysis is done correctly, data can help paint a more complete picture of an entire season. It remembers everything equally and doesn’t rely on only the memorable moments that we humans tend to rely on because of our mental limits.

Representativeness

The representativeness heuristic is the use of stereotypes to judge the likelihood that something belongs to a certain group. A classic example of the representativeness heuristic (and an example that Khaneman and Tversky use themselves) involves flipping coins. Humans are shown two sequences of six coin flips, HTHTTH and HHHTTT, and asked which of the two sequences is more likely to occur in real life. Probabilistically, both sequences have equal likelihood of occurring, but humans tend to think that HTHTTH is more likely to occur because it looks more random. In other words, HTHTTH appears to be more representative of randomness than HHHTTT, and because we know coin flips to be random, we assume that HTHTTH is more likely to occur.

Why do we do this? For anyone that hasn’t recently taken a probability class, calculating the likelihood of each of the two sequences is a tough proposition, so our minds come up with a shortcut: representativeness. In some cases this shortcut might be helpful: it might allow us to make a snap decision where there isn’t time for a difficult calculation. However, we can run into bias with this heuristic because even if Thing A is representative of Thing B, Thing A is not necessarily the same as Thing B.

This principle can come into play in sports when evaluating players. For example, we might over-rate or under-rate players based on their physical size. Yes, physical appearance can certainly be correlated to performance in sports (there aren’t many 140lb NFL linemen walking around), but it’s only one of many factors.

In Moneyball, Lewis describes how the A’s were able to find value in players that didn’t have the typical physical build of professional baseball players. These players were being overlooked by other teams because their physical appearances weren’t representative of prototypical professional baseball players, even though they were in some cases as productive on-field as players whose appearances were stereotypically more athletic.

Situational Limits: Time and Resources

Beyond cognitive biases and other limits related to the nature of the human mind, there are external situational limits that constrain how much information we can process to make decisions. Time constraints limit our ability to digest large quantities of information, and resource constraints limit our ability to pay other people to help us digest large quantities of information.

There’s only so much time in a day. If you’re trying to evaluate prospective players, you may not have hours upon hours of time to digest the quantity of game film you need to watch to understand the full picture of each prospect’s career. If, for example, you were the technical director of a soccer club interested in signing a left back, you would have to watch hundreds of hours of game film to comprehensively evaluate the entire universe of left back prospects. And that’s just one position. For an entire team, that can add up to a lot of time spent watching film. Obviously, teams tend to deal with this issue by hiring scouts. If you’re the technical director of a team with significant resources, you can afford to hire a team of talent evaluators to spread around the work of watching and evaluating film.

But what if you’re the technical director of a team without the monetary resources to hire a massive team of talent evaluators? You could rely on evaluations by members of the media, but you wouldn’t have control over the specifics of the information you’re receiving. What if the media members writing about left backs don’t focus on the same traits you prioritize in a left back? You’re out of luck.

Many front office decision-makers don’t have the time or resources to comprehensively evaluate the entire universe of possible signings through watching film. Data can help mitigate those constraints. I’ll return to this example of a soccer team searching for a left back in a future post to illustrate how.

In Summary

In general, sports analytics provides a toolkit that helps us make decisions under two types of constraints: mental constraints and situational constraints. Mental constraints, which stem largely from cognitive limits and biases, prevent us from adequately processing everything we see on the field, diamond, or court. Situational constraints, often time and resource limits, prevent us from dedicating time and money to ensure that every game is watched and analyzed thoroughly.

The result of these constraints is an environment in which humans make decisions without sufficiently processing all of the relevant information. It’s an environment ripe for analytics.