I'm well on track for my 500 @Kaggle hours. Got a bit carried away in March, with a few free weekends. Some of this work isn't public yet, but will be soon.
Competition wise, things are going less well. I've joined a few comps late, but my results aren't anything to write home (or write tweets) about, so far. No teaming up yet, either.
I've learnt a few new tricks, though; especially for imaging data. Hoping to build on those.
On the other hand, the #KaggleHiddenGems series is going well. Fast approaching volume 50, for which I have one or two special things planned. Consistent patterns are key here. And I'm happy to have managed to post a new episode every Tuesday in those first 3 months.
On blog posts I'm at 1 in 3 months. Not great. Maybe this goal is bit stretchier than others. Maybe the problem is perfectionism, and I need to learn to write shorter posts more frequently. Let's experiment a little over the coming 3 months.
Running is main exercise these days, and I'm working up to the marathon distance. Maybe even optimistically aiming at a sub-4 hour time. My pace isn't bad and I've run a few half marathons, but recently failed a 30k threshold run. Let's see whether I can crack that in April.
Doing regular pull-ups in preparation for the muscle up, but decided to wait with more specific exercises until after the marathon. Days are a bit busy, and having a single workout focus helps.
Last but certainly not least: sleep. January was really good, but Kaggle and sleep don't go always well together; so I'm barely above my 8h goal now. The 8h, or even 8.5h, give me better productivity (and strength), though. Need to be careful to keep up the good habits.
And yes: I've built myself a dashboard to track those goals. Nobody said that accountability can't be fun 😉
• • •
Missing some Tweet in this thread? You can try to
force a refresh
"RANZCR 1st Place Soluiton Cls Model (small ver.)" by Qishen Ha:
Another underrated 1st place competition notebook: this well-structured work demonstrates a part of the 2-stage segmentation + classification approach that won the recent imaging challenge.
A narrated introduction to using the Biopython library on a genome dataset. Note the way in which the code is enriched with detailed explanations and interpretations.
Featuring great narration and well-crafted visuals, this excellent #rstats notebook based on the 2020 Kaggle Survey analyses its captivating title question from different angles.
"A Very Extensive Porto Exploratory Analysis" by @CaptCalculator:
A compact visual EDA and baseline model that deals with the challenges of anonymised features & imbalanced targets. Clear organisation helps the reader to navigate the feature set.
A compact work providing adversarial validation of the rainforest competition data together with interpretable Shapely values via GPU-powered #XGBoost in the @RAPIDSai framework.