Note that these categories can be combined in different ways such as a "multiple univariate regression model" which uses multiple input features (explanatory variables) to predict a single output value (response variable).
Training a Linear Model
The goal of training any machine learning model is to find the optimal parameters that best describe the relationship between the input features and the target values.
This method does not use gradient descent but rather a linear algebra technique called Singular Value Decomposition (SVD) to calculate the model parameters.
The above SVD method has a computational complexity of O(n**2), where "n" is the number of features.
This means if you double the number of input features, the computation time will be multiplied by 4.
2. Gradient Descent
In gradient descent, you start by defining a cost function that represents how bad your model is doing. Next, you search for the parameters that minimize that cost function.
Types of Gradient Descent Algorithm
a. Batch Gradient Descent
Uses the whole training set to make a single gradient descent step.
The cost function decreases smoothly with each iteration.
The training becomes slower when you have a large number of training instances.
b. Stochastic Gradient Descent
Uses a single randomly chosen example to make a single gradient descent step.
Because a random example is chosen in each iteration, the cost function bounces up and down, decreasing only on average.
Because of its oscillating behavior, stochastic gradient descent is a better approach if your algorithm got stuck in a bad local minimum.
c. Mini-Batch Gradient Descent
Uses a small random set of instances called 'mini-batches' in each iteration.
This approach takes advance of the performance boost from hardware optimization for vectorized operations.
Now, which approach should you use? 1. If you have a relatively low number of features per example, then the SVD method is a good choice.
The SVD method can handle huge training sets really well as long as each instance contains a small number of input features.
2. Avoid using the SVD method if each instance is made up of a large number of features (10,000 for example).
3. If you have a huge number of features per example, then gradient descent is the way to go.
4. Batch gradient descent is very slow when you have a huge number of training examples.
In that case, you can choose stochastic or mini-batch gradient descent as they process only one instance (stochastic) or a small number of instances (mini-batch) each iteration.
5. If your cost function has local minima, stochastic gradient descent is your best bet because of its oscillating behavior.
6. When using Gradient descent, input features must be scaled using sklearn's StandardScaler.
That's it for this thread.
If you enjoyed this thread, make sure to follow me @ammaryh92 for more contact about data science and Python programming.
Thanks a lot for your time and support.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
📜Ideally, All 3 sets (train, dev, test) should be perfectly representative of the new instances you want to generalize to.
Otherwise, a good performance on the training data will not correspond to doing well on your application.
📜Introduction
- After training a ML model, it is important to assess its performance before putting it into production.
- We start by measuring the model performance on the training set to evaluate how well the model fits the training data.
- Then we measure the model performance on the test set to evaluate the generalization error.
To measure the model performance on the training set, we need a reference value against which we can compare the model performance.
This reference value is called "Bayes Error". 👇
📜Introduction
Why do we need to split our data?
After training the model, you want to test its performance on new data before putting it in production. In other words, you want to measure the generalization error (how well does the model generalize to new data?).
The data is commonly split into 3 different sets: 1. Training Set 2. Development Set (Holdout Validation Set) 3. Test Set
✍️Introduction
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.
1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.
Pandas
- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
resources:
1⃣ youtube.com/playlist?list=…
2⃣
NumPy (Numerical Python)
- NumPy is an open source project aiming to enable numerical computing with Python.
- It provides functions and methods for performing high level mathematical functions on multi-dimensional arrays and matrices.