Imagine you work at Tesla, building the next generation of self-driving cars.
You wanna build a better version of the autopilot system, which decides in real-time what the car should do next. For example:
1. go straight β¬οΈ 2. turn left β¬ οΈ 3. turn right β‘οΈ 4. stop β
Imagine you have historical data with labels you can use to train your ML model, in this case, a classifier with 4 possible outcomes.
And you manage to build a model with 99% accuracy.
The question is: is this accuracy "good enough", or not?
To answer this, you need to translate this abstract "accuracy" into something meaningful for the business.
For example, "probability of a car crash"
To greenlight your new autopilot system, the team needs to ensure that the implied "probability of a car crash" is
β lower than the current system's (baseline 1)
β lower than the probability of a crash when a human drives the car (baseline 2)
To map your "abstract" accuracy of 99%, into a meaningful "probability of a car crash", you immerse your ML agent into a simulation engine and let it navigate it as much as possible.
Every time the agent crashes, you record it, and you get your crash metric.
You compare this metric with the 2 baselines, and decide the model is "good enough" when
(your_system_crash < baseline 1) AND (your_system_crash < baseline 2)
If either one of these inequalities does not hold, the model is NOT good enough, and you need to work on it further.
To sum up,
β Real-world ML models are ultimately evaluated in terms of business metrics.
β An ML model is "good enough" when its implied business metric beats the status quo.
Wanna get more real-world ML content?
Subscribe to my newsletter and get for FREE my eBook