Welcome to another rousing article on how to not be a jackass when running a business! Today we are talking about the curses of statistics applying to business metrics.

How do we define metrics? Well, here's my off-the-cuff definition:

Metrics are data points used to derive some sort of insight into a business process to drive conclusions or outcomes of any kind. These manifest as Key Performance Indicators, Objectives and Key Results, spreadsheets, graphs, financial reports, monitoring charts, or anything that can be used to aggregate data into something visual to then be consumed by leadership or other interested and/or accountable parties.

Okay, well, that's a little long with a ton of business crap in it, but it's accurate. In short: metrics are anything that makes pretty graphs so you can see things happen. Metrics in their own right are not bad. Honestly, they're usually pretty cool! Everyone likes to see "line go up" (unless we're talking about HTTP error rates on a website)! But, they can hide all sorts of nuance and detail that can rot out a business or its people or its systems from the inside, even when the graph says everything is hunky-dory. We're going to talk about a few key concepts of why this happens, and how to avoid it, and why it's such a major pet peeve of mine. And as usual, to help organize my thoughts, we're going to start with a basic table of contents.

Section A: The Phrase "Lies, Damned Lies, and Statistics"
Section B: Goodhart's Law and the Perversion of Metrics
Section C: Video Games and the Ideal State of Metrics

I will try to keep this interesting for my less business-y inclined readership through my usual wit, which usually involves calling middle manager types a bunch of things you can't say on LinkedIn. And pictures.

 

The Phrase "Lies, Damned Lies, and Statistics"

Required Reading: https://journalofethics.ama-assn.org/article/median-isnt-message/2013-01

Numbers in their own right are insufficient to draw conclusions. Typically, these statistical elements play into making the numbers turn into some kind of conclusion:

  • A baseline
  • Year-over-year / Month-over-month / X-over-x comparisons
  • Trendlines
  • Anomaly detection (e.g. sharp spikes or dips)
  • Context (the most difficult one!)

These all assist with getting good conclusions out of the datasets you are visualizing, and they're all valid in what they're good at, but they don't address these major issues:

  • Structural issues with the data
  • Accidental inclusions or exclusions of load bearing data points
  • The actual conclusions themselves
  • The biases and objectives of the statistician themselves (Please don't call me woke. This happens all the fucking time.)
  • The deadly sin of assuming correlation equals causation.

And what happens when your pretty charts fall victim to the above?

  • The root cause of trends is never identified.
  • The data points are gamed or manipulated based on weaknesses in the metrics, poisoning the conclusions.
  • Corrective action fails to rectify any issue identified in the charts, or has utterly the opposite effect.
  • Any positive conclusion from the charts may be a false sense of security.
  • You'll probably get fired because you fucked up.

Statisticians (of which I do not claim to be) call these "Confounding Variables" and/or "Lurking Variables", or basically "your metrics / stats suck because outside influences you didn't properly account for led to conclusions that were incorrect". If you want a punchier example, consider an example from Discover Magazine: Hot weather causes both ice cream sales (you eat ice cream to cool off) and drownings (you jump in a lake or river to cool off) to go up, but if you didn't realize hot weather existed or people's innate tendencies to cool off in hot weather, you might conclude that ice cream leads to drowning, and everyone would point and laugh as you try to point to your chart that tries to argue that position. And then you get fired.

 

Goodhart's Law and the Perversion of Metrics

Required Reading: https://en.wikipedia.org/wiki/Goodhart%27s_law

Numbers, bar charts, graphs, and other products of metrics and statistics are a simplification of underlying datasets to drive better understanding of those datasets, at their core. This, in itself, is not evil. Example time!

You have 1,000,000 web requests in a log file and you'd like a distribution of requests by IP and  you write a shell command to awk out the IP and pipe that to a pretty bar chart and it is spit out and you determine one asshole from Egypt is absolutely destroying your services. Nothing about this is evil in its own right. Rifling through a million logs sucks. A chart makes it easy to draw conclusions from the dataset. And you banish that guy from Egypt to the void. Well done, pats all around. In this specific example in a vacuum, you've probably made the right call, and I'm not going to sit here and go "But wait! He was actually cool!". Because not in every case does the chart lie. But, in some cases the chart can lie, and can even mislead you to nuking someone actually innocent, or doing other bad things. So let's do a more interesting example!

You are an IT Service Director for BigCorp. You have 50 support technicians organized by juniors, seniors, leads, and managers at your disposal. You run a help desk which processes 10,000 tickets a month from multiple channels: phone, email, in-person, etc. You are judged on your various service metrics, or Key Performance Indicators (KPIs): SLA met percentage, CSAT scores, closed ticket count, and so on. As the director, you are responsible for keeping these metrics in order. You don't do the tickets yourself, but you are accountable for their completion and the smooth running of the help desk. And, being the shrewd director you are, you make a nice dashboard for yourself and upper management to pore over using live help desk data, and you start putting pressure on the managers to pressure the leads into pressuring the frontline workers to start increasing the good metrics and reducing the bad metrics so you can report progress to upper management and get your well deserved yearly bonus. This scenario is extremely common and can apply to any sort of service organization in a large company, but IT help desks I know all too well, and I suspect some of my readers do too, and the above shouldn't sound all that crazy for anyone who has worked in a knowledge worker environment for more than three months. And those who read the first section probably know what's coming.

One of your team leads is a real smarty pants. They realize that you are tracking "ticket close" metrics, and not "time spent" metrics. They conclude that the best bang for their buck here is to yank all the really easy tickets as fast as possible via a script from the ticket queue and work those, making their team look like champions. The numbers say they are champions. They were closing 200 tickets a month, now they're at 1,000. And strangely, everyone else's went down. These guys look like heroes, whereas everyone else is clearly flaunting your authority, Mr. Director. What are you to do about this?

  • Pay cuts for the teams that had their scores go down?
  • Bonuses for the team who rose to the occasion?
  • Sit down and realize that maybe your metrics suck ass?

You, Mr. Director, are a prideful man. You'll never go for option 3, but you're a benevolent fella. You'll take option 2, and give bonuses to those heroes, all while the other teams look on in disgust. You sit in your ivory fucking tower completely disconnected from the dataset driving your pretty graphs, and slap a monetary incentive to the metrics via bonuses, and now the game is afoot, and all your teams are engineering themselves into a nerd arms race to grab all the easy tickets first. And then you notice that your CSAT scores are taking a dive, and some big customers are threatening to leave because support is not following up. You check your charts and see that yes, indeed, "line go up" is happening on completion metrics, but you have no clue what is going on. You take these numbers to leadership and they don't care. "Get CSAT up now", they say, "or it's your head!" You'll never realize that complex-looking tickets are languishing in the queue because you haven't incentivized anyone to take those, and surprise, surprise, big customers have big problems.

This hypothetical, unfortunately, happens every day. I have experienced it myself. I have been the smarty pants, and I have been the director. I have seen both sides, and the only way forward is to not have metrics that suck, and to not fall victim to this section's namesake: Goodhart's Law.

 

Video Games and the Ideal State of Metrics

For the longest time, video games have provided to me the ideal state of metrics. I have never, until writing this article, been able to reconcile why my metrics suck and yet metrics in complex video games are amazing and let me feel like a CEO and make wide sweeping decisions knowing what the impact will be. Is it because I suck? Is it because there is some unassailable difference between how I think at work and when I am doing work thinly veiled as a video game? 

Actually the real answer is a tightly controlled set of variables that feed into those metrics, so basically the existence of confounding / lurking variables is near zero because real life is far more complex than the rules engine and challenges of a video game. I am not sure why I was unable to see that until now, but I will say that there is nothing wrong with striving to reach the clarity of the metrics and data points provided to you in complex video games like Factorio, Stellaris, Civilization, whatever. Those games present these artifacts to you so you can make decisions to win the game, and that is really a great ideal to work towards!

This is an example heavy article, so I will do yet another one. In the game of Factorio, I can see the difference between iron plate production and consumption. As Mr. Factory Manager, I can determine when this difference is close enough to warrant either cutting consumption by reviewing my factory, or increasing production by opening a new mine. My dataset is automatically collected from all production and consumption endpoints (removal of a very difficult real-life variable) as part of the game's logic. Now, it's up to me to draw the conclusions around why either consumption is up or production is down, but I am armed with the information that "I need to do something" and I get to choose what that something is and see it reflected in the charts directly. I can also export this information to the circuit network to surface the right information at the right time, or make automatic actions on what to do (e.g. remotely turn on a mine or turn off some low-importance production chains). I am thoroughly embarrassed to admit that I took this to its logical maximum and beyond in a previous very high-end Factorio run myself, where I made a whole ass monitoring dashboard in game and used it like a ticket system (remember that previous article about how I work where I said that high signal-to-noise ratio systems, when taken care of properly, transcend their original function and do interesting things? Yep. Here it is again).

 

So I guess my conclusion can be summarized as "Make your data feel like it does in a video game." How can you do this? Well, I guess the most important thing I can leave you with in this section is some guidelines for things that I have done that have worked well enough, but I am still not the best at this, so you'll have to draw some of your own conclusions from the pieces I've given you thus far:

  • Remove as many confounding or lurking variables as you can from your metrics, especially the ones that can skew data significantly. Human error should also be rooted out as much as possible.
  • Your metrics should be so good that they tell a story right then and there and naturally lead the observer to conclude the right things.
  • Don't be disappointed if something you make comes to a conclusion you weren't expecting. Just because the numbers came out wrong doesn't mean you should go back to the drawing board. Verify your data, but don't discount it. It may be telling you something real.
  • Never, ever hinge performance or goals off your metrics. That may sound counter intuitive, but your goal or outcomes should NEVER be "line go up", because I guarantee you that line will go up, and it won't mean jack diddly squat when it does.
  • Eye charts (charts that look really nice) are extremely powerful. Having great data and a terrible presentation layer for that data can make your audience disengage or still somehow come to the wrong findings. Make it pretty! Take some pride!

I'm sorry I don't have a better conclusion. I am still working on this myself. Good datasets are hard to come by, and good charts harder still, and good conclusions even harder. Metrics and reports and tracking things has confounded me for over a decade, and I don't even want to imagine what my ADHD friends have to go through when it comes to this topic. I'm getting better, but I can't give you a one-size-fits-all conclusion. Still, I hope this helped a bit.