the importance of de-averaging metrics

The average is perhaps the most commonly used statistic to summarize a set of data.

And while the average is a very useful, simple tool in our analysis toolkit, it’s often important to dig a bit deeper and better understand the underlying distribution of the data that we’re averaging.

Let’s walk through an example.

Say we own a business that sells widgets at two different stores. One of the stores is downtown and the other store is in an industrial park.

Two metrics we care about as a business are:

  • Total Unique Customers
  • Avg. Purchases per Customer

Each of the two stores has an equivalent Total Unique Customers of 20 in a month. Further, each of the two stores has the same Avg. Purchases per Customer of 10 in a month.

On the surface, it seems like these two stores have very similar customer bases, but we need to do some de-averaging of Avg. Purchases per Customer to be sure.

A simple way to de-average is to visually examine the underlying distribution of the values that generated the average.

In other words: Look at a histogram of the data set.

For our example, de-averaging involves examining the distribution of total purchases for each customer during the month in question.

So let’s do that!

First, the distribution of purchases by customer for the downtown store:

Next, the distribution of purchases by customer for the industrial park store:

Obviously, these distributions are entirely different and might point to very different customer dynamics between the two stores.

It appears that at the downtown store, almost all customers buy around 10 widgets per month, give or take.

But at the industrial park store, there appear to be two very different types of customers:

  • Customers who buy ~10 widgets per month
  • Customers who buy 20 widgets per month

What might be going on here?

Well, the customers that are buying 20 widgets per month could be buying in bulk for some reason. They could be businesses buying widgets to incorporate into their own product, or something similar.

Let’s imagine that we did a bit of deeper customer research and confirmed that the customers buying 20 widgets per month are indeed businesses.

All of a sudden, we’ve found a important way to segment our customer base into:

  • Individual consumers
  • Businesses

This is critical insight because knowing that these two segments of customers exists impacts how we operate the business going forward. For example, we might want to hire salespeople to manage the B2B side of the operation and consumer marketing folks to handle the B2C side of the operation.

Moral of the story: Taking the time to understand the underlying distribution of a data set, rather than relying solely on summary statistics like the average, can go a long way toward better insight.