📊Explaining accuracy to a non-technical stakeholder
Too high-level, and they suspect you're hiding something. Too granular, and they'll be lost in the weeds.
My solve? Visualize F1 👇
First, I find F1 navigates simplicity and power. This lets me earn trust by briefly explaining the downside of traditional Accuracy:
"Imagine predicting fraud where only 1% of the transactions were fraudulent. If the model predicted that none were fraudulent, it would be..."
Your stakeholder will instantly understand -- "99% Accurate!"
Use that to explain the power of F1:
"F1 measures how well the model is doing at finding that 1%, and only that 1%."
I have yet to find a stakeholder that would at least hear me out after that. So next...
The big hurdle: explaining F1. What you don't do is say, "F1 is the harmonic mean of precision and recall, which is true positives over total positives, and true positives over all true predictions respectively."
Recipe for that glazed donut look -- not in a good way. 🍩
I always prepare a visual laid out on a single page (so they can take it with them and tape it to their cubicle wall and impress their friends with their stats knowledge).
First, a visual legend:
I explain the difference between Predictions (what the model thinks is the correct answer) and Reality (the correct answer).
I then show them how the model can be correct in two ways -- by predicting yes when the answer is yes, and predicting no when the answer is no.
I ask them to notice how a filled in dot is a correct prediction, while an empty dot is an incorrect prediction. The color indicates what the right answer was.
Now for the big reveal. I give them a visual of the performance of their model:
I explain that there are 100 dots, and each one represents a percentile. I rarely need to do any other context setting for stakeholders to understand this visual.
Once they've soaked in that (letting them do a few mental calculations), I introduce a glossary of metrics, again using the visual to ground them:
Many of them will start trying to do a few of the calculations in their head -- which is great! That means they're engaged!
Once any questions they have are out of the way, it's time for the final act -- their model's performance metrics:
I explain that these ratios just describe the 100-dot visual in a few numbers.
I go back to the imbalanced class example -- the model that predicted there was no fraud would have a very low score, while their model has a high score!
Now we get to have some A+ conversations:
Introducing Drift
"We always check whether the model's performance is changing over time. Here's the same metrics, but only for observations in the last month."
Segmentations
"I know segment X is really important to you. Here's how the model does with those observations."
Sources of Errors
"The model is really struggling with observations that have characteristic Y. Here are the metrics for just those observations. Do you have any ideas how these observations might be different than others?"
Caveat: This only works for binary classification models. But (thread for another day) I find those also strike an amazing balance between simplicity and power, so it's my default way to define the model problem.
🚨: Make sure the two colors you choose are color-blindness friendly! Red/green meaning no/yes is powerful, and you can use it to your advantage, but not all red/green combos are intelligible to a colorblind person.
This is why having one makes you irreplaceable! I don’t know of a place to go to find sets of metrics frameworks by function or org type (add a link if you have one!).
But I do have a method for developing them — think in terms of inputs and outputs.
Every organization is in the business of turning inputs into outputs.
Whether in manufacturing where raw materials become products, or in education where applicants become graduates, something goes in one end of an org, gets processed, and comes out changed.
Each function in an org governs one or more steps of the input to output process. E-commerce is about turning people into customers of some product:
➜ Marketing puts more people in the input-output funnel.
➜ UX oversees the customer interfaces
➜ Finance manages company funds
Whether you’re a #DataScientist, #Analyst, or #MachineLearning Engineer, fill your toolbox with one of the most important tools for any job:
🧰 Metrics Frameworks
They’re the shortcut to being effective and irreplaceable.
Example:
Take a org type like an e-commerce business. Most of them have the same functions within them: sales and/or marketing, finance, product, UX (site/app design and maintenance), logistics, customer service, legal, etc.
Seems like a ton of disparate data! Well, turns out…
while it’ll feel overwhelming the first time you work in a new unit, metrics frameworks for a given function are surprisingly durable from org to org.
Here’s an e-commerce marketing metrics framework (I spent time as a marketer and a marketing analyst, so it’s my favorite):