Kaggle's 2021 State of Data Science and Machine Learning survey was released a few days ago.

If you didn't see it, here are some important takeaways đź§µ
Top 5 IDEs

1. Jupyter Notebook
2. Visual Studio Code
3. JupyterLab
4. PyCharm
5. RStudio
ML Algorithms Usage: Top 10

1. Linear/logistic regression
2. Decision trees/random forests
3. Gradient boosting machines(Xgboost, LightGBM)
5. Convnets
6. Bayesian approaches
7. Dense neural networks(MLPs)
8. Recurrent neural networks(RNNs)
9. Transformers(BERT, GPT-3)
10. GANs
Machine Learning Tools Landscape - Top 8

1. Scikit-Learn
2. TensorFlow(tf.keras included)
3. XGBoost
4. Keras
5. PyTorch
6. LightGBM
7. CatBoost
8. Huggingface🤗
Cloud Computing Tools - Top 3

1. AWS
2. GCP
3. Microsoft Azure
Enteprise ML Tools - Top 5

1. Amazon SageMaker
2. DataBricks
3. Asure ML Studio
4. Google Cloud Vertex AI
5. DataRobot

Notes: If you look at the graph, it seems that over half the number of the survey responders don't use those kinds of tools.
Databases - Top 4

1. MySQL
2. PostgreSQL
3. Microsoft SQL Server
4. MongoDB
Machine Learning Experimentation Tools - Top 4

1. TensorBoard
2. MLflow
3. Weights & Biases
4. Neptune.ai

Notes: Looking at the graph, the majority of Kagglers do not track their ML models. All eye on the leaderboard!
AutoML Tools - Top 5

1. Google Cloud AutoML
2. Azure Automated ML
3. Amazon SageMaker Autopilot
4. H20 Driverless AI
5. Databricks AutoML
CONCLUSIONS:

1. Notebooks are still the most appreciated way of experimenting with ML. If you never did it, try them in VSCode.
2. Scikit-Learn is ahead of the game
3. All you need is XGBoost(CC: @tunguz)
4. No need for model tracking on Kaggle. There is a leaderboard
If you would like to read the whole survey, here is the link:

kaggle.com/kaggle-survey-…
Thanks for reading.

For more machine learning content, follow @Jeande_d.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jean de Nyandwi

Jean de Nyandwi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Jeande_d

17 Oct
Source of errors in building traditional programs:

â—†Wrong syntaxes
â—†Inefficient codes
Source of errors in machine learning:

â—†Solving a wrong problem
â—†Using a wrong evaluation metric
â—†Not being aware of a skewed data
â—†Inconsistent data preprocessing functions
More sources of errors in ML:

â—†Putting too much emphasis on models than data
â—†Data leakage
â—†Training on the test data
â—†Model and data drifts
Read 7 tweets
11 Oct
Python image processing libraries

â—†Scikit-Image
â—†Pillow
â—†NumPy(image is just an array of pixels)

The following are more than image processing, they provide state-of-the-art computer vision and machine learning algorithms:

â—†OpenCV
â—†OpenMMLab
Also, most machine learning frameworks have image processing functions.

TensorFlow has tf.image and Keras took that further to image processing layers that you can insert inside the model.
Read 4 tweets
8 Oct
The machine learning research community is very and very vibrant.

Here is what I mean...🧵🧵
In 1958, Frank Rosenblatt invented a perceptron, a very simple algorithm that would later turn out to be the core and origin of to days intelligent machines.
In essence, the perceptron is a simple binary classifier that can determine whether or not a given input belongs to a specific class.

Here is the algorithm of perceptron:
Read 27 tweets
7 Oct
The most useful courses are free. They are only challenging and hard to complete, which is why they are useful.

Here are 4 examples of the free machine learning courses that with enough dedication can help you get useful skills.

đź§µ
1. Machine Learning by Andrew Ng. on Coursera

Price: Free
Students: Over 4 million people

coursera.org/learn/machine-…
2. Full Stack Deep Learning by UC Berkeley

Price: Free

fullstackdeeplearning.com/spring2021/
Read 7 tweets
4 Oct
How to think about precision and recall:

Precision: What is the percentage of positive predictions that are actually positive?

Recall: What is the percentage of actual positives that were predicted correctly?
The fewer false positives, the higher the precision. Vice-versa.

The fewer false negatives, the higher the recall. Vice-versa. Image
How do you increase precision? Reduce false positives.

It can depend on the problem, but generally, that might mean fixing the labels of those negative samples(being predicted as positives) or adding more of them in the training data.
Read 10 tweets
26 Sep
Releasing a complete machine learning package containing over 30 end to end notebooks for:

â—†Data analysis
â—†Data visualization
â—†Data cleaning
â—†Classical ML
â—†Computer vision
â—†Natural language processing

Everything is now accessible here:

github.com/Nyandwi/machin…
Every single notebook is very interactive.

It starts with a high-level overview of the model/technique being covered and then continues with the implementation.

And wherever possible, there are visuals to support the concepts.
Here is an outline of what you will find there:

PART 1 - Intro to Programming and Working with Data

â—†Intro to Python for Machine Learning
â—†Data Computation With NumPy
â—†Data Manipulation with Pandas
â—†Data Visualization
â—†Real EDA and Data Preparation
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(