14 Sep, 19 tweets, 4 min read
The ABSOLUTE ESSENTIALS of Linear Regression

- Intuition behind linear regression.
- Different implementations.
- Comparisons in terms of computational complexity.
Introduction

A linear regression model aims to use input features to predict a target numeric value.

For example, training a linear model to predict the price of a set of features such as area, location, number of rooms, etc.
The intuition behind "Linear"

A linear model assumes that the relationship between the input features and the target value is essentially linear.

So the goal of training a linear model is to find the line that best describes this relationship (The line of best fit).
Types of Linear Regression

Linear regression can be divided into different types depending on:

A. The number of input features
1. Simple Regression
Uses a single input feature to make a prediction.

2. Multiple Regression
Uses multiple input features to make a prediction.
B. The number of target values
1. Univariate Regression
Predicts a single target value.

2. Multivariate Regression
Predicts multiple target values.
Note that these categories can be combined in different ways such as a "multiple univariate regression model" which uses multiple input features (explanatory variables) to predict a single output value (response variable).
Training a Linear Model

The goal of training any machine learning model is to find the optimal parameters that best describe the relationship between the input features and the target values.

Read my article on this topic for a more detailed explanation.
ammarblogs.hashnode.dev/model-training…
Training Methods

1. Scikit-learn model (SVD method)

This method does not use gradient descent but rather a linear algebra technique called Singular Value Decomposition (SVD) to calculate the model parameters.
The above SVD method has a computational complexity of O(n**2), where "n" is the number of features.

This means if you double the number of input features, the computation time will be multiplied by 4.
In gradient descent, you start by defining a cost function that represents how bad your model is doing. Next, you search for the parameters that minimize that cost function.

Uses the whole training set to make a single gradient descent step.

The cost function decreases smoothly with each iteration.

The training becomes slower when you have a large number of training instances.
Uses a single randomly chosen example to make a single gradient descent step.

Because a random example is chosen in each iteration, the cost function bounces up and down, decreasing only on average.
Because of its oscillating behavior, stochastic gradient descent is a better approach if your algorithm got stuck in a bad local minimum.
Uses a small random set of instances called 'mini-batches' in each iteration.

This approach takes advance of the performance boost from hardware optimization for vectorized operations.
Now, which approach should you use?
1. If you have a relatively low number of features per example, then the SVD method is a good choice.

The SVD method can handle huge training sets really well as long as each instance contains a small number of input features.
2. Avoid using the SVD method if each instance is made up of a large number of features (10,000 for example).

3. If you have a huge number of features per example, then gradient descent is the way to go.
4. Batch gradient descent is very slow when you have a huge number of training examples.

In that case, you can choose stochastic or mini-batch gradient descent as they process only one instance (stochastic) or a small number of instances (mini-batch) each iteration.
5. If your cost function has local minima, stochastic gradient descent is your best bet because of its oscillating behavior.

6. When using Gradient descent, input features must be scaled using sklearn's StandardScaler.

If you enjoyed this thread, make sure to follow me @ammaryh92 for more contact about data science and Python programming.

Thanks a lot for your time and support.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

# More from @ammaryh92

14 Aug
The ABSOLUTE ESSENTIALS of Data Mismatch in Machine Learning

🧵This thread will cover the following concepts:
a. Data Distribution (explanation + examples)
b. Data Mismatch Problem
In a previous thread, we discussed that we should split our data into three parts:
- Training Set
- Development Set
- Test Set
👇👇
📜Ideally, All 3 sets (train, dev, test) should be perfectly representative of the new instances you want to generalize to.
Otherwise, a good performance on the training data will not correspond to doing well on your application.
26 Jul
The ABSOLUTE ESSENTIALS of Bias/Variance Analysis

🧵This thread will cover the following concepts:
a. Bayes Error
b. Bias vs Variance
c. Possible Solutions

(Explanation + Examples)

#MachineLearning #DataScience
📜Introduction
- After training a ML model, it is important to assess its performance before putting it into production.
- We start by measuring the model performance on the training set to evaluate how well the model fits the training data.
- Then we measure the model performance on the test set to evaluate the generalization error.

To measure the model performance on the training set, we need a reference value against which we can compare the model performance.
This reference value is called "Bayes Error". 👇
25 Jul
The ABSOLUTE ESSENTIALS of Splitting Data for Machine Learning

(Explanation + Scikit-learn Implementation)

#MachineLearning #DataScience
📜Introduction
Why do we need to split our data?
After training the model, you want to test its performance on new data before putting it in production. In other words, you want to measure the generalization error (how well does the model generalize to new data?).
The data is commonly split into 3 different sets:
1. Training Set
2. Development Set (Holdout Validation Set)
3. Test Set
24 Jul
The ABSOLUTE ESSENTIALS of scikit-learn every data scientist should know
(Introduction + Examples)

#MachineLearning #DataScience
✍️Introduction
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.
1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.
23 Jul
I've written multiple threads on how to get started with #MachineLearning, #DeepLearning , and #DataScience in general.
check them out (Bookmark).
🧵👇👇
23 Jul
#python packages for #DataScience and #MachineLearning
(explanation + resources)

🧵👇
Pandas
- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
resources: