Naina Chaturvedi Profile picture
Sep 19, 2023 34 tweets 9 min read Read on X
✅Measuring performance in ML is essential to assess the quality and effectiveness of your models - Explained in simple terms.
A quick thread 👇🏻🧵
#MachineLearning #DataScientist #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ It is the process of quantitatively evaluating how well a trained ML model performs on a given task or dataset. It involves using specific metrics and techniques to assess the model's ability to make accurate predictions or decisions.
2/ During Model Development: Performance measurement is an integral part of ML model development. It helps data scientists and ML engineers assess the effectiveness of different algorithms, features, and hyperparameters during the model training and selection process. Image
3/ After Model Deployment: Once a model is deployed in the production environment, continuous performance monitoring is essential. It helps ensure that the model maintains its accuracy and effectiveness over time. Image
4/ Benchmarking: Performance measurement is also crucial for benchmarking different models or approaches to solve a particular problem. It allows for fair comparisons and the selection of the best-performing model. Image
5/ Model Evaluation: Measuring performance allows us to assess how well a machine learning model is generalizing from the training data to unseen data. This evaluation is critical for selecting the best model among different options. Image
6/ Hyperparameter Tuning: Performance metrics guide the process of hyperparameter tuning, where we adjust the model's settings to achieve better results. Without measuring performance, we wouldn't know which hyperparameters to choose. Image
7/ Comparing Models: Performance metrics enable us to compare different models or algorithms to determine which one performs better for a specific task. This helps in making informed decisions about model selection. Image
8/ Classification:

Accuracy: It measures the fraction of correctly classified instances out of the total instances.
Precision: It quantifies the accuracy of positive predictions. Precision is the ratio of true positives to the total predicted positives. Image
9/ Recall: It quantifies the ability of the model to find all relevant instances. Recall is the ratio of true positives to the total actual positives.
F1-score: It is the harmonic mean of precision and recall and provides a balanced measure.
10/ ROC-AUC: Receiver Operating Characteristic - Area Under the Curve measures the model's ability to distinguish between classes.
11/ Regression:

Mean Absolute Error (MAE): It measures the average absolute difference between the predicted and actual values.
Mean Squared Error (MSE): It measures the average squared difference between predicted and actual values. Image
12/ R-squared (R2): It quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables. Image
13/ Clustering:

Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters. Values range from -1 (incorrect clustering) to +1 (highly dense clustering). Image
14/ Davies-Bouldin Index: Measures the average similarity between each cluster with the cluster that is most similar to it. Lower values indicate better clustering.
15/ Recommendation:

Mean Average Precision (MAP): Measures the average precision for a recommendation system.
Hit Rate: Measures the proportion of correct recommendations in the top-N recommendations. Image
16/ BLEU (Bilingual Evaluation Understudy): Measures the similarity of machine-generated text to a set of reference texts.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Evaluates the quality of summaries and machine-generated text. Image
17/ In classification problems, when one class significantly outnumbers the others, it can lead to imbalanced datasets. In such cases, a model that predicts the majority class most of the time can achieve high accuracy, but it may perform poorly on the minority class. Image
18/ Confusion Matrix is a table that helps visualize the performance of a classification algorithm by showing the number of correct and incorrect predictions made by the model.
True Positives (TP): The number of instances correctly predicted as positive. Image
19/ True Negatives (TN): The number of instances correctly predicted as negative.
False Positives (FP): The number of instances incorrectly predicted as positive.
False Negatives (FN): The number of instances incorrectly predicted as negative.
20/Accuracy: It measures the overall correctness of model's predictions. Accuracy is calculated as (TP + TN) / (TP + TN + FP + FN).

Precision: Precision quantifies how many of positive predictions made by model are actually correct. Precision is calculated as TP / (TP + FP).
21/ Recall (Sensitivity or True Positive Rate): Recall measures how many of the actual positive instances were correctly predicted by the model. It is calculated as TP / (TP + FN).
22/ F1-Score: The F1-score is the harmonic mean of precision and recall, providing a balanced measure that considers both false positives and false negatives. It is calculated as 2 * (Precision * Recall) / (Precision + Recall).
23/ Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC) are valuable tools for evaluating the performance of binary classifiers. They help assess a classifier's ability to distinguish between positive and negative classes. Image
24/ Cross-validation is a crucial technique for estimating a model's performance on unseen data and reducing the risk of overfitting. It involves splitting the dataset into multiple subsets (folds) to train and evaluate the model multiple times. Image
25/ Bias-Variance Trade-off is a fundamental concept that addresses the balance between two types of errors that models can make: bias and variance. Image
26/ Bias: Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. High bias models are typically too simplistic and make strong assumptions about the data, resulting in poor predictions.
27/ Variance: Variance refers to the error introduced by the model's sensitivity to small fluctuations or noise in the training data. High variance models are overly complex and can fit the training data too closely, capturing noise rather than the underlying patterns.
28/ Overfitting and Underfitting are common issues that arise when a model's performance on training and validation datasets indicates problems with generalization. Image
29/ Overfitting occurs when a model learns the training data too well, capturing noise and small fluctuations in the data rather than the underlying patterns.

The model performs exceptionally well on the training data but poorly on the validation (or test) data.
30/ Underfitting occurs when a model is too simple to capture the underlying patterns in the data.

The model performs poorly on both the training and validation data.
There may be a minimal gap or no gap between the training and validation performance, but both are subpar.
31/ Hyperparameter Tuning is the process of finding the best set of hyperparameters for a ML model to achieve optimal performance on a given task. Hyperparameters are configurations and settings that are not learned from data but are set before training the model. Image
32/ Feature importance measures quantify the impact of each input feature on the model's predictions. For interpretable models, understanding which features are most influential can be critical. Image
33/Subscribe and Read more -
Github -
naina0405.substack.com
github.com/Coder-World04/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Naina Chaturvedi

Naina Chaturvedi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @NainaChaturved8

Dec 16, 2023
✅Attention Mechanism in Transformers- Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Attention mechanism calculates attention scores between all pairs of tokens in a sequence. These scores are then used to compute weighted representations of each token based on its relationship with other tokens in the sequence. Image
2/ This process generates context-aware representations for each token, allowing the model to consider both the token's own information and information from other tokens.
Read 23 tweets
Nov 13, 2023
✅Regularization is a technique used in ML to prevent overfitting and improve the generalization of a model - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Regularization is a technique in machine learning used to prevent overfitting by adding a penalty term to the model's loss function. The penalty discourages overly complex models and promotes simpler ones, improving generalization to new, unseen data. Image
2/ When to use regularization:

Use regularization when you suspect that your model is overfitting the training data.
Use it when dealing with high-dimensional datasets where the number of features is comparable to or greater than the number of samples.
Read 18 tweets
Nov 9, 2023
✅XGBoost is a powerful and efficient gradient boosting library designed for ML tasks, specifically for supervised learning problems- Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ XGBoost is ensemble learning method that combines multiple decision trees into a strong predictive model. It builds decision trees sequentially, where each tree corrects errors of previous ones. XGBoost optimizes a differentiable loss function to minimize prediction errors. Image
2/ When to Use XGBoost:

Use XGBoost when you need a highly accurate predictive model, especially in situations where other algorithms may struggle with complex patterns and relationships in the data. Image
Read 25 tweets
Nov 9, 2023
✅Gradient Boosting is a powerful machine learning technique used for both regression and classification tasks - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Gradient Boosting is an ensemble learning method that combines the predictions of multiple weak learners (often decision trees) to create a stronger and more accurate predictive model.
2/ How Gradient Boosting Works:

Gradient Boosting builds an ensemble of decision trees sequentially. It starts with a simple model (typically a single tree) and then iteratively adds more trees to correct the errors made by the previous ones. Image
Read 25 tweets
Nov 6, 2023
✅Cross-validation in ML is particularly useful for estimating how well a model will perform on unseen data - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Cross-validation involves splitting the dataset into multiple subsets and using different parts of the data for training and testing at each iteration. The primary goal of cross-validation is to obtain a more robust and unbiased estimate of a model's performance. Image
2/ Why use Cross Validation -
Performance Estimation: Cross-validation provides a more robust and unbiased estimate of a model's performance. It helps you to obtain a more accurate assessment of how well your model will perform on new, unseen data. Image
Read 25 tweets
Oct 22, 2023
✅Feature selection and Feature scaling are crucial Feature Engineering steps - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Feature selection is the process of choosing a subset of the most relevant features (variables or columns) from your dataset. It involves excluding less informative or redundant features to improve model performance and reduce computational complexity. Image
2/ When to Use It:
High-Dimensional Data: Feature selection is crucial when you have a high-dimensional dataset, meaning there are many features compared to the number of data points. High dimensionality can lead to overfitting and increased computational costs. Image
Read 39 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(