"Starting point: Big Query and London Crime" by @WCanniford:
A narrated introduction to Google's Big Query powering the exploration of a crime dataset. The tools & SQL are well explained; beyond that the visuals help to understand the quirks of the data.
"Deep Sea Dive into Top 10% - Survival Guide" by @agodwin_p:
The famous Titanic starter challenge is always worth a look for creative content, such as this well illustrated and narrated end-to-end analysis featuring helpful infographics and references.
"Cropland Mapping - Random Forest & Neural Network" by @pcbreviglieri:
Written in an engaging style, this work compares tree-based & deep learning methods for tabular multiclass problem & discusses preprocessing; specifically highly correlated features.
"Analyzing Editors Descriptions | Hidden Gems" by Thomas Konstantin:
Time to get meta: This work explores the Hidden Gems series, specifically the text and sentiment of my reviews, with great ideas, narration, and structure. And now it contains itself.
"Evolution of social complexity: A data analysis" by Muskan Jain:
This Notebook studies the changes in social complexity over time. It applies external diagrams to great effect, and features detailed interpretations of its findings.
"COVID-19 World Vaccination Progress" by Ivanna Chovhan:
A well organised Notebook which demonstrates how readability is drastically improved through section headers, compact code, and nuggets of interpretation and narration that accompany visuals.
"Forecast with N-BEATS || Interpretable model" by Gaétan Dubuc:
This work presents a detailed introduction on a neural network time series forecasting method, complete with applied examples. Note the clean structure and helpful visuals.
I'm well on track for my 500 @Kaggle hours. Got a bit carried away in March, with a few free weekends. Some of this work isn't public yet, but will be soon.
Competition wise, things are going less well. I've joined a few comps late, but my results aren't anything to write home (or write tweets) about, so far. No teaming up yet, either.
I've learnt a few new tricks, though; especially for imaging data. Hoping to build on those.
"RANZCR 1st Place Soluiton Cls Model (small ver.)" by Qishen Ha:
Another underrated 1st place competition notebook: this well-structured work demonstrates a part of the 2-stage segmentation + classification approach that won the recent imaging challenge.
A narrated introduction to using the Biopython library on a genome dataset. Note the way in which the code is enriched with detailed explanations and interpretations.
Featuring great narration and well-crafted visuals, this excellent #rstats notebook based on the 2020 Kaggle Survey analyses its captivating title question from different angles.
"A Very Extensive Porto Exploratory Analysis" by @CaptCalculator:
A compact visual EDA and baseline model that deals with the challenges of anonymised features & imbalanced targets. Clear organisation helps the reader to navigate the feature set.
A compact work providing adversarial validation of the rainforest competition data together with interpretable Shapely values via GPU-powered #XGBoost in the @RAPIDSai framework.