sports analytics 101: blind spots

Sports Analytics 101 is a series of blog posts outlining the core concepts behind sports analytics in non-technical terms. You can find all available installments in the series here.

In an earlier post, I introduced a framework for thinking about an individual sports analytics metric. This framework is essentially mental “paperwork” to fill out whenever you use a new metric to ensure you understand what the metric is and what it isn’t.

In using the framework, we first establish the name of the metric and what it’s being used to quantify. Next, we establish whether the metric is a fact or a proxy, whether the metric is descriptive or predictive, whether the metric is a productivity metric or a style metric, and whether the metric is adjusted for opportunity. Last but not least, we need to identify any blind spots in the metric.

Virtually no proxy metric perfectly quantifies what it’s intended to quantify, but that doesn’t mean proxies are useless. In fact, the best metrics are imperfect proxies. That does mean, however, that in order to use a proxy metric effectively, we need to understand its blind spots.

Let’s look at a fairly straightforward example: a quarterback’s Completion Rate. Completion Rate is a relatively simple metric, defined as:

Completion Rate = Completed Passes / Total Passes

Completion Rate is hardly an advanced metric, but it’s still used by some to quantify a quarterback’s passing efficiency. So, let’s assume you were using Completion Rate to quantify and compare the passing efficiencies of NFL quarterbacks. What might we be missing in using this metric?

To start, completion rate treats all passes equally, so completing a short swing pass out of the backfield is worth as much as a 50-yard bomb. If, hypothetically, one quarterback completes 5/10 passes and all of the passes are for 50 yards, that quarterback would have the same completion rate (50%)  as a quarterback who completes 5/10 passes for five yards apiece. The two quarterbacks would demonstrate equivalent efficiency when measured by Completion Rate, but the quarterback throwing 50-yard bombs ends up with quite a bit more yardage for his efforts.

Further, completion rate doesn’t account for the receiver’s role in a pass. If a quarterback throws a bad pass, but his receiver makes an excellent catch, the quarterback gets credit for a completed pass in the Completion Rate calculation. Inversely, if a quarterback makes a fantastic pass that the receiver somehow drops, that fantastic pass counts against the quarterback in Completion Rate.

Let’s look at another example: basic Plus-Minus in basketball. Plus-Minus attempts to quantify a player’s overall productivity based on the team’s performance when the player is in the game. The calculation is relatively simple:

Plus-Minus = Team points with player in game - Opp. points with player in game

This basic version of Plus-Minus is riddled with blind spots, but perhaps the most notable is that the metric doesn’t account for how teammate performance can impact a player’s rating.

For example, Let’s say that Allie and Nina play on the same team and, because they play complimentary positions, they’re always in the game or on the bench together. In other words, when Allie plays, Nina plays, and vice versa.

It also happens that Nina is the best player on the team and Allie is fairly mediocre. This difference in productivity should be captured by Plus-Minus, right? Well, no. Because Allie and Nina play the same minutes, any points scored by their team during those minutes will go towards both of their respective Plus-Minus ratings and any points conceded during those minutes will go against both of their respective Plus-Minus ratings.

So Allie and Nina will always have the same Plus-Minus rating, even if Nina is more productive than Allie.

Virtually all metrics have blind spots, some more than others. As we identify blind spots, we either need to make adjustments to the metric to account for those blind spots (ESPN’s Real Plus-Minus, for example, is a more advanced version of Plus-Minus that accounts for the issue described above) or we need to ensure we understand how the blind spots might impact our interpretation of the metric. Doing so is critical to good sports analytics process.