I finished my first Kaggle competition and scored in the top 4% of participants.
I learned a few valuable lessons. Here they are: ↓
Most important lesson:
@Kaggle is a firehouse of new knowledge. In around 3 weeks, I learned more than in the last 3 months combined.
It's not only about the competition, but the people and the collaboration.
If you haven't tried yet, consider it.
Kaggle is all about squeezing as much performance out of your solution as you can.
Complexity and runtime are secondary.
This is very different than real-life applications, but it forces you to learn something different and valuable.
Blending predictions using K-Fold Cross-Validation is a very popular Kaggle technique.
If you use 10-folds, you predict your target 10 times and blend those predictions into a final value.
This was an interesting approach that I had never seen before.
I ended up summarizing my strategy as "aggressive ensembling."
I built an ensemble of 3 models, then stacked 3 more models on top, and then 1 final model at the end.
I would have never done this in a real-life application (too much complexity for very little gain.)
For the first time, I used Optuna to tune the hyperparameters of my models.
Loved it! Straightforward interface to get the job done.
From now on, I plan to use Optuna for anything non-Keras-related (I use KerasTuner for that.)
An algorithm that I hadn't use before was CatBoostRegressor.
My solution ended up using an XGBoostRegressor, an LGBMRegressor, a Lasso, and a CatBoostRegressor model.
My best work: I got to build sort of a framework that I plan to reuse at similar future Kaggle competitions.
This was the key to my scoring: I could experiment rapidly thanks to this effort.
I overfit the Public Leaderboard and dropped 200+ places on the Private Leaderboard (My solution was between the top 50 solutions in the Public Leaderboard.)
This was humbling.
Lesson learned: Forget the public leaderboard. Use my gut to pick the best solution.
One more note:
This competition was part of the 30 Days of ML initiative ran by @kaggle.
This means that most participants were either new to Kaggle (like me) or completely new to machine learning.
I'm still proud of my accomplishment, but take it with a grain of salt.
To recap:
• Kaggle is really fun.
• I learned a ton. Really!
• I'd suggest you try.
I'm looking forward to the next competition, time permitting.
If you are curious about my solution, here is the notebook: