If you are planning to get into machine learning, then you are likely to use scikit-learn, one of Python's most famous libraries.
In this thread, I will try to break down scikit-learn's API which could be intimidating in the beginning.
🧵👇 #MachineLearning
Estimators
- An estimator is any scikit-learn object that learns some parameters from data.
- All estimators implement "fit()" method to perform the estimation process.
- Estimators can also act as transformers or predictors.
Transformers
- They are estimators which use the parameters that they have learned to transform data.
- All transformers can implement "transform()" method to perform the transformation process.
I know this sounds vague, but it will become clearer after the following example.
Suppose we have a dataframe (a table) with some missing values in a particular column.
One way to tackle the missing values problem is to replace the missing values in each column with the mean of that column.
We can achieve this using scikit-learn's SimpleImputer object.
In that case, the SimpleImputer object acts as both an estimator and a transformer.
- It acts as an estimator because it first has to learn some parameters from data by calculating the mean of each column.
- It also acts as a transformer because it uses the learned parameters (each column's mean) to replace the missing value, thus transforming the dataframe.
Note:
Transformers can also implement a "fit_transform()" method which allows it to perform the estimation and the transformation process with one call, which is usually more computationally efficient.
Predictor
- Predictors are estimators which can learn some parameters from data (usually labeled data), and then use these learned parameters to predict new values.
- All predictors implement the "predict()" method to perform the prediction process.
For example, scikit-learn's LinearRegression object is an estimator and a predictor.
First, it estimate some parameters from labeled data (the slope and the intercept) using the "fit()" method.
Then, It uses the learned parameters to make predictions using the "predict()" method.
Predictors can also implement a "score()" method to measure the quality of the predictions.
The "score()" method takes as an input both the predicted value and the actual value, and measure the quality of the predictions using a specific performance measure that you can control.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The term Machine Learning sounds mysterious and confusing to a lot of people especially beginners.
In this thread, I will try to explain how does a machine learn, and why do we even need machine learning?
🧵👇
In pre-machine learning era, we had what is called "rule-based systems".
This basically means that we provide a machine with a bunch of instructions on how to perform a certain task.
For example, if we need to write a function that returns the square of a number.
With rule-based system, this is very easy. 1. First we define a function called Square, for example. 2. Square function takes X as an input, where X can be any number. 3. Square function multiplies X by itself (X **2). 4. Square function returns the result to the user.
Data Science learning roadmap
If you are looking to get into machine learning, I would like to make your life easier by listing some of my favorite resources in this thread 🧵👇 #MachineLearning
1⃣Python is the most common language for data science (even though there are other great languages like R, C++)
This playlist by @JustinMitchel is a great way to start. youtube.com/playlist?list=…
Bias/Variance analysis is one of the most important concepts in machine learning as it allows you to measure and improve the performance of your model.
🧵In this thread, I will discuss the basic idea behind it, so let's dive right in. #MachineLearning#ArtificialIntelligence
1⃣Bias (Underfitting)
What is bias?
In simple terms, bias is the difference between the model predictions and the actual values.
So if a model has a high bias, it means it has underfitted the training data. In other words, the model has a high training set error.
What causes a high bias?
Bias, or underfitting, mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and target outputs.
Feature Scaling is one of the most important transformations you need to apply to your data before feeding it to your machine learning model.
🧵thread 👇
- Feature scaling refers to the concept of normalizing the range of input features so that they have similar scales.
- Most machine learning algorithms do not work well when the input numerical attributes have very different scales.
1⃣Min-Max Scaling (Normalization)
- In this method, values are rescaled so they end up ranging between 0 and 1 (the range can be changed in some cases).
- Normalization is very sensitive to the existence of outliers.