Tweet

Ammar Yasser

24 Jul, 17 tweets, 5 min read

The ABSOLUTE ESSENTIALS of scikit-learn every data scientist should know
(Introduction + Examples)

🧵Long Thread 👇👇
#MachineLearning #DataScience

✍️Introduction
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.

1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.

Example 1: (fill missing values in an array)
Step 1
We have an array of numbers of shape (15,1) with sum null values.

Step 2
We use scikit-learn's simple imputer to impute (or replace) missing values with the mean of the array (strategy = 'mean')

👇👇

Step 3
We estimate the mean of the array using the fit( ) method of the imputer object.
(As previously explained, fit( ) method is used to estimate a parameter, the mean in this case, from input data)

Step 4
We can access the learned parameter using the statistics_ attribute.

Notice that up until this point, we haven't actually replaced the missing values.
All we've done is calculate the mean of the array.
The next logical step is to use the learned parameter (the mean) to replace the missing values.
That's when transformers kick in.

2⃣ Transformer
- A transformer is an estimator that can use the learned parameters to transform the data.
- The transformation process is performed using the transform( ) method.

e.g., The simple imputer object in the previous example can use the learned parameter (the mean of the array) to replace the missing values in the array.

Example 1: (fill missing values in an array)
Step 1 : Step 4
Explained Above

Step 5
Calling the transform( ) method of the simple imputer object replaces missing values in the array with the mean of the array, and returns the new transformed array.

However, sometimes we need to use the parameters that the model has learned from the input data to predict the output of new data rather than transforming the data.
That's when we need predictors.

3⃣ Predictors
- A predictor is an estimator that can use the learned parameters to predict the output of new data.
A linear regression model is an estimator and a predictor as it can learn the parameters from the data (slope, intercept) and can predict the output of new data.

Example 2: (building a linear regression model)
Step 1
We have input data (X) and target labels (y).

Step 2
We create a LinearRegression object to build our linear regression model.

👇👇

Step 3
We call the fit( ) method so that our model that can the parameters (intercept, slope) that best fit the model to the training data.

Step 4
We can access the learned parameters using the intercept_ attribute (intercept), and the coef_ attribute (slope).

Step 5
We use the predict method to predict the output of a new input instance.

Note that we can also use the score( ) method to evaluate the accuracy of our predictions.

@aureliengeron

Note:
Example 2 is taken from @aureliengeron's amazing book
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems"

That's it for this thread.
I know it's quite long, but learning these principles will make your life much easier when working with scikit-learn.

@ammaryh92

If you found this thread to be useful, kindly consider retweeting the first tweet, and for more content about
#MachineLearning and #DataScience, follow me
@ammaryh92

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ammaryh92

Ammar Yasser

@ammaryh92

23 Jul

I've written multiple threads on how to get started with #MachineLearning, #DeepLearning , and #DataScience in general.
check them out (Bookmark).
🧵👇👇

https://twitter.com/ammaryh92/status/1407005252948303874

1⃣ #DataScience roadmap

https://twitter.com/ammaryh92/status/1407005252948303874

https://twitter.com/ammaryh92/status/1417170678055378945

2⃣ Data analysis resources

https://twitter.com/ammaryh92/status/1417170678055378945

Read 5 tweets

Ammar Yasser

@ammaryh92

23 Jul

#python packages for #DataScience and #MachineLearning
(explanation + resources)

🧵👇

Pandas
- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
resources:
1⃣ youtube.com/playlist?list=…
2⃣

NumPy (Numerical Python)
- NumPy is an open source project aiming to enable numerical computing with Python.
- It provides functions and methods for performing high level mathematical functions on multi-dimensional arrays and matrices.

resources:
1⃣

Read 12 tweets

Ammar Yasser

@ammaryh92

23 Jun

The term Machine Learning sounds mysterious and confusing to a lot of people especially beginners.
In this thread, I will try to explain how does a machine learn, and why do we even need machine learning?
🧵👇

In pre-machine learning era, we had what is called "rule-based systems".
This basically means that we provide a machine with a bunch of instructions on how to perform a certain task.
For example, if we need to write a function that returns the square of a number.

With rule-based system, this is very easy.
1. First we define a function called Square, for example.
2. Square function takes X as an input, where X can be any number.
3. Square function multiplies X by itself (X **2).
4. Square function returns the result to the user.

Read 13 tweets

Ammar Yasser

@ammaryh92

22 Jun

If you are planning to get into machine learning, then you are likely to use scikit-learn, one of Python's most famous libraries.
In this thread, I will try to break down scikit-learn's API which could be intimidating in the beginning.
🧵👇
#MachineLearning

Estimators
- An estimator is any scikit-learn object that learns some parameters from data.
- All estimators implement "fit()" method to perform the estimation process.
- Estimators can also act as transformers or predictors.

Transformers
- They are estimators which use the parameters that they have learned to transform data.
- All transformers can implement "transform()" method to perform the transformation process.

I know this sounds vague, but it will become clearer after the following example.

Read 12 tweets

Ammar Yasser

@ammaryh92

21 Jun

Data Science learning roadmap
If you are looking to get into machine learning, I would like to make your life easier by listing some of my favorite resources in this thread 🧵👇
#MachineLearning

@JustinMitchel

1⃣Python is the most common language for data science (even though there are other great languages like R, C++)
This playlist by @JustinMitchel is a great way to start.
youtube.com/playlist?list=…

@udacity

2⃣SQL
This @udacity free course is amazing.
udacity.com/course/sql-for…

Read 13 tweets

Ammar Yasser

@ammaryh92

21 Jun

Bias/Variance analysis is one of the most important concepts in machine learning as it allows you to measure and improve the performance of your model.
🧵In this thread, I will discuss the basic idea behind it, so let's dive right in.
#MachineLearning #ArtificialIntelligence

1⃣Bias (Underfitting)
What is bias?
In simple terms, bias is the difference between the model predictions and the actual values.
So if a model has a high bias, it means it has underfitted the training data. In other words, the model has a high training set error.

What causes a high bias?
Bias, or underfitting, mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and target outputs.

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Ammar Yasser

Try unrolling a thread yourself!

More from @ammaryh92

Ammar Yasser

Ammar Yasser

Ammar Yasser

Ammar Yasser

Ammar Yasser

Ammar Yasser

Did Thread Reader help you today?

Like this author's thread?