The ABSOLUTE ESSENTIALS of Linear Regression

🧵This thread will cover:
- Intuition behind linear regression.
- Different implementations.
- Comparisons in terms of computational complexity.

A linear regression model aims to use input features to predict a target numeric value.

For example, training a linear model to predict the price of a set of features such as area, location, number of rooms, etc.
The intuition behind "Linear"

A linear model assumes that the relationship between the input features and the target value is essentially linear.

So the goal of training a linear model is to find the line that best describes this relationship (The line of best fit).
Types of Linear Regression

Linear regression can be divided into different types depending on:

A. The number of input features
1. Simple Regression
Uses a single input feature to make a prediction.

2. Multiple Regression
Uses multiple input features to make a prediction.
B. The number of target values
1. Univariate Regression
Predicts a single target value.

2. Multivariate Regression
Predicts multiple target values.
Note that these categories can be combined in different ways such as a "multiple univariate regression model" which uses multiple input features (explanatory variables) to predict a single output value (response variable).
Training a Linear Model

The goal of training any machine learning model is to find the optimal parameters that best describe the relationship between the input features and the target values.

Read my article on this topic for a more detailed explanation.…
Training Methods

1. Scikit-learn model (SVD method)

This method does not use gradient descent but rather a linear algebra technique called Singular Value Decomposition (SVD) to calculate the model parameters.
The above SVD method has a computational complexity of O(n**2), where "n" is the number of features.

This means if you double the number of input features, the computation time will be multiplied by 4.
2. Gradient Descent
In gradient descent, you start by defining a cost function that represents how bad your model is doing. Next, you search for the parameters that minimize that cost function.
Types of Gradient Descent Algorithm

a. Batch Gradient Descent
Uses the whole training set to make a single gradient descent step.

The cost function decreases smoothly with each iteration.

The training becomes slower when you have a large number of training instances.
b. Stochastic Gradient Descent
Uses a single randomly chosen example to make a single gradient descent step.

Because a random example is chosen in each iteration, the cost function bounces up and down, decreasing only on average.
Because of its oscillating behavior, stochastic gradient descent is a better approach if your algorithm got stuck in a bad local minimum.
c. Mini-Batch Gradient Descent
Uses a small random set of instances called 'mini-batches' in each iteration.

This approach takes advance of the performance boost from hardware optimization for vectorized operations.
Now, which approach should you use?
1. If you have a relatively low number of features per example, then the SVD method is a good choice.

The SVD method can handle huge training sets really well as long as each instance contains a small number of input features.
2. Avoid using the SVD method if each instance is made up of a large number of features (10,000 for example).

3. If you have a huge number of features per example, then gradient descent is the way to go.
4. Batch gradient descent is very slow when you have a huge number of training examples.

In that case, you can choose stochastic or mini-batch gradient descent as they process only one instance (stochastic) or a small number of instances (mini-batch) each iteration.
5. If your cost function has local minima, stochastic gradient descent is your best bet because of its oscillating behavior.

6. When using Gradient descent, input features must be scaled using sklearn's StandardScaler.
That's it for this thread.

If you enjoyed this thread, make sure to follow me @ammaryh92 for more contact about data science and Python programming.

Thanks a lot for your time and support.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Ammar Yasser

Ammar Yasser Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ammaryh92

14 Aug
The ABSOLUTE ESSENTIALS of Data Mismatch in Machine Learning

🧵This thread will cover the following concepts:
a. Data Distribution (explanation + examples)
b. Data Mismatch Problem
In a previous thread, we discussed that we should split our data into three parts:
- Training Set
- Development Set
- Test Set
📜Ideally, All 3 sets (train, dev, test) should be perfectly representative of the new instances you want to generalize to.
Otherwise, a good performance on the training data will not correspond to doing well on your application.
Read 18 tweets
26 Jul
The ABSOLUTE ESSENTIALS of Bias/Variance Analysis

🧵This thread will cover the following concepts:
a. Bayes Error
b. Bias vs Variance
c. Possible Solutions

(Explanation + Examples)

#MachineLearning #DataScience
- After training a ML model, it is important to assess its performance before putting it into production.
- We start by measuring the model performance on the training set to evaluate how well the model fits the training data.
- Then we measure the model performance on the test set to evaluate the generalization error.

To measure the model performance on the training set, we need a reference value against which we can compare the model performance.
This reference value is called "Bayes Error". 👇
Read 20 tweets
25 Jul
The ABSOLUTE ESSENTIALS of Splitting Data for Machine Learning

(Explanation + Scikit-learn Implementation)

🧵 Long Thread 👇👇
#MachineLearning #DataScience
Why do we need to split our data?
After training the model, you want to test its performance on new data before putting it in production. In other words, you want to measure the generalization error (how well does the model generalize to new data?).
The data is commonly split into 3 different sets:
1. Training Set
2. Development Set (Holdout Validation Set)
3. Test Set
Read 21 tweets
24 Jul
The ABSOLUTE ESSENTIALS of scikit-learn every data scientist should know
(Introduction + Examples)

🧵Long Thread 👇👇
#MachineLearning #DataScience
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.
1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.
Read 17 tweets
23 Jul
I've written multiple threads on how to get started with #MachineLearning, #DeepLearning , and #DataScience in general.
check them out (Bookmark).
Read 5 tweets
23 Jul
#python packages for #DataScience and #MachineLearning
(explanation + resources)

- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
2⃣ Image
NumPy (Numerical Python)
- NumPy is an open source project aiming to enable numerical computing with Python.
- It provides functions and methods for performing high level mathematical functions on multi-dimensional arrays and matrices.

1⃣ Image
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!