, 29 tweets, 27 min read Read on Twitter
#SoDS19
Day 1 of #30DaysOfKaggle : Complete Intro to Machine Learning in Kaggle Learn.
Good start.
#SoDS19
Day 2 of #30DaysOfKaggle:
Intermediate Machine Learning from Kaggle Learn.
Dealing with missing values.
#SoDS19
Day 3 of #30DaysofKaggle:
Completed remaining lessons in Intermediate Machine Learning. Advanced to Rank #764 in the competition (Rank #7250 in Day 1). Now I feel so good about this.
With a few hours remaining, I'll take SQL next.
@BecomingDataSci
#SoDS19
Day 4 of #30DaysOfKaggle:
Completed Intro to SQL from Kaggle Learn. I underestimated SQL, will have to give more time later to learn about queries.
@kaggle @rctatman @BecomingDataSci
#SoDS19
Day 5 of #30DaysOfKaggle:
Brushed up my Pandas by completing Pandas Micro-Course from @kaggle Learn (thanks, @ResidentMario!).
#SoDS19
Day 6 of #30DaysOfKaggle:
Data Visualization Micro-Course from @kaggle Learn (thanks @alexis_b_cook !).

Well, I guess brushing up is done. Tomorrow I will learn about Kaggle competition and choose one dataset for practice purpose.
@BecomingDataSci
#SoDS19
Day 7 of #30DaysOfKaggle:
Learned about Kaggle competition and started practicing based on this very good article by @koehrsen_will :

blog.kaggle.com/2018/08/22/mac…

@BecomingDataSci
#SoDS19
Day 8 of #30DaysOfKaggle:
Continued working on Home Credit Default dataset (it's tough, but I learned a lot about EDA from @koehrsen_will's work).
Explored Discussion forum, found that Data Science community is very supportive to newbies. Love it!
@BecomingDataSci @kaggle
@koehrsen_will @BecomingDataSci @kaggle #SoDS19
Day 9 of #30DaysOfKaggle:
Completed practice w/ Home Credit Default Risk dataset.
Explored Discussion forum, especially on Getting Started section, many valuable advise there, and directed to this interesting article: linkedin.com/pulse/12-thing…
@BecomingDataSci @kaggle
#SoDS19
#30DaysOfKaggle

Preparation stage is done. The remaining days will be for the real competition, that'll be tough. I may not get an excellent score by the end of 30 days, but I am very EXCITED because I believe I will learn a lot during the process.
@BecomingDataSci
#SoDS19
Day 10 of #30DaysOfKaggle:
Browsed through the list of active competitions, reviewed each one and decided to join House Prices competition, considering my learning objectives and timeframe.
Built the framework for this task.
Learned about each feature in the dataset.
#SoDS19
Day 11 of #30DaysOfKaggle:
EDA. EDA. EDA. This will take days, I believe.
Following the framework shared by @pmpmarcelino, I completed the first 2 steps:
Assessed each feature's importance.
Performed univariable study.
@BecomingDataSci @kaggle
kaggle.com/pmarcelino/com…
#SoDS19
Day 12 of #30DaysOfKaggle:
Still in EDA/cleaning phase.
Completed multivariate analysis.
Almost completed dealing with missing values (the server broke down just before I write imputation code block, which is the final step). Maybe it's time for me to take early sleep.
#SoDS19
Day 13 of #30DaysOfKaggle:
As predicted, still in EDA phase. I wanted to move fast, build a model and tune the parameters, but I felt a need to really understand the data first.
Another thing: I have to learn more about data visualization coding.
#SoDS19
Day 14 of #30DaysOfKaggle:
EDA is done, at least for now.
I will take course specialized on EDA soon. There are many techniques I don't know, along with their visualization code. Anyone can help me with good resources to learn?
@BecomingDataSci
#SoDS19
Day 15 of #30DaysOfKaggle:
Started building a model.
Could not get the shape right.
Stuck.
Call it a day, will try again during the weekend, before looking up tutorial (hopefully I don't have to).
@StackOverflow will help.
@BecomingDataSci
@StackOverflow @BecomingDataSci #SoDS19
Day 16 of #30DaysOfKaggle:
Reworked the dataset and got the shape right.
Built the first model and submitted the prediction as my first entry, to see where I would be with simple modeling.
Ranked #2960/4281. I still have 14 days to improve.
#SoDS19
Day 17 of #30DaysOfKaggle:
Performed feature engineering to improve accuracy and performance.
Failed.
Found the error, finally.
Looked again at this work from afar, aware that I have to do this more systematically. Next, I will plan ahead before jumping into writing code.
#SoDS19
Day 18 of #30DaysOfKaggle:
While trying to correct the error I found before, I found out that my data cleaning has a significant mistake that must be corrected. So, more rework. It's still a progress, though. It makes me know the data better, also improves data quality.
#SoDS19
Day 19 of #30DaysOfKaggle:
Data cleaning rework done.
Experiment: included only 11 features (out of 77) which I deemed very important based on EDA.
Result: commit time improved from 293s to 42s, but the score worsen from 0.156 to 0.185.
Current rank: #3065/4455.
#SoDS19
Day 20 of #30DaysOfKaggle:
Reworked imputation step, ensured each feature treated appropriately.
Dropped only 3 features.
Result: score 0.157, only a bit worse than my best (0.156), but much better performance (from 293s to 53s).
Current rank: #3046/4418.
#SoDS19
Day 21 of #30DaysOfKaggle:
Read the original paper on the dataset. I should have read this before. It gave more perspective, also made me more appreciative on how people contribute to learning community by sharing their data sets.
@kaggle
@kaggle #SoDS19
Day 22 of #30DaysOfKaggle:
Added Feature Engineering steps.
Log transformed skewed features to improve normality.
Thanks to @apapiu for the guidance.
Result: RMSE score improved from 0.156 to 0.126.
Rank: advanced 1486 places to #1559/4400!!
@kaggle @apapiu #SoDS19
Day 23 of #30DaysOfKaggle:
Improve Feature Engineering.
Play around with different regression models.
Result: score slightly improved to 0.123
Rank: advanced to #1375/4391.
It could be better. I should learn about Linear Regression again.
@kaggle @apapiu #SoDS19
Day 24 of #30DaysOfKaggle:
Relearned Linear Regression and explored the @scikit_learn documentation.
Found and watch awesome Youtube videos on that topic (and others) by @joshuastarmer. The concepts become easy to grasp.
Check out this link:
statquest.org
@kaggle @apapiu @scikit_learn @joshuastarmer #SoDS19
Day 25 of #30DaysOfKaggle:
Learned about Gradient Boosting, again started from @joshuastarmer's video.
Visited my notebook again, played with Ridge and Lasso parameters.
Resubmitted, failed to get better accuracy.
Tried XGBoost, also failed.
Call it a day.
@kaggle @apapiu @scikit_learn @joshuastarmer #SoDS19
Day 26 of #30DaysOfKaggle:
Learned about ensemble method and used it.
Result: score improved from 0.1233 to 0.1161.
Rank: advanced 733 places to #677/4441! Hooray!
I'll stick to this model for now and will try to optimize it.
@kaggle @apapiu @scikit_learn @joshuastarmer #SoDS19
Day 27 of #30DaysOfKaggle:
Tweaked parameters and delivered multiple commits, none can improve the score accuracy.
Is it time to clean up the notebook and wrap up this whole exercise?
#SoDS19
Day 28 of #30DaysOfKaggle:
Started cleaning the notebook: removing unused codes, reorganizing the flow, adding narratives and comments.
Not completed.
It took longer than I thought. I will continue tomorrow.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Liberto Siahaan
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!