After months of travel and $100k+ in fees, our machine learning model failed badly on the test set.
Most ML efforts create no value. We should learn from those failures.
Here is a favorite of mine.
(attempt #2)
Our client would use this model to better price and underwriting mines. This would ultimately lead to increased safety and efficiency.
We also identified multiple state and national datasets on mining activity and safety that we could augment this data with.
Then we used S-Plus to visualize the distribution of claim outcomes, and to validate that some of the most obvious relationships we expected in the data held true.
We chose the Tweedie distribution to model this data, as a compound Poission-Gamma distribution fit our observation of many 0s (no claims) and occasional large values.
en.wikipedia.org/wiki/Tweedie_d…
We used GLMs with smoothing splines for continuous features and one-hot encoding for categorical features.
We were following best practices (at the time) in the wonderful (and free) Elements of Statistical Learning:
web.stanford.edu/~hastie/ElemSt…
Meanwhile, we cross-validated our model to ensure we didn’t overfit.
We went to a bar.
We tested that model, and it was slightly better, but not really.
We went back to the bar.
In retrospect, I doubt it.
So what went wrong?
So while we could estimate the global parameters of the Tweedie distribution, there was no way we could support a machine learning model on it.
They ranged from large scale heavy machinery mines like the one we had visited, to family operations with pick-axes, to above ground strip mining operations.
There was little to generalize across these entities.
While I am confident they had many biases (Moneyball), they also had tremendous wisdom.