βοΈ What are Performance Metrics?
To study the effectiveness of a product or device in any field, we use performance metrics. Additionally, we can use performance metrics to measure how a machine learning model performs. The performance metrics provide the results in numbers.
Therefore, it becomes pretty easy to understand and evaluate the performance of most machine learning models.
Let's say I tell you that a movie is 3 on 5. You can directly understand that the film is okay and not great.
Therefore, when someone QUANTIFIES or puts a number value to the performance of a device, product or machine learning model, you can quickly understand whether or not it performs well. In other words, you can evaluate the product quickly by seeing a number score.
βοΈ ACCURACY
Accuracy is a performance metric. It is calculated as the number of correct predictions divided by the total number of predictions.
See the formula below:-
Let's say:-
There are 700 images of dogs and cats.
Out of 700, 350 images are of dogs, and 350 images are of cats.
Out of 350 dog images, the model correctly identifies 320 images
Out of 350 cat images, the model correctly identifies 310 images
Therefore, the model correctly identifies 320 as dog images and 310 as cat images. Thus, the model correctly identifies 630 images.
Now, according to the formula, the accuracy of the model is 0.9 or 91%.
IMPORTANT OBSERVATION:-
The accuracy metric works best when the dataset is balanced. It is not reliable for an imbalanced dataset.
βοΈ Balanced vs Imbalanced Dataset
-> Balanced Dataset:-
The dataset is balanced when the target labels are nearly equal or equal in number in a classification problem. For example, out of the 700 images, if 350 are dog images and 350 are cat images, this is a balanced dataset.
-> Imbalanced Dataset
We say the dataset is imbalanced when the target labels are not equal and incomparable in a classification problem.
In other words, when there are more number target labels of one class than the other.
For example, out of 700 dogs and cats images, there are only 10 dog images and 690 cat images. This is an imbalanced dataset.
βοΈ DRAWBACK - Why shouldn't you use accuracy as a performance metric?
Let us take the same example for the imbalanced dataset we took above. In which there are 700 images of dogs and cats. From the 700 images, there are 10 photos of dogs and 690 photos of cats.
Assume that our model correctly identifies 630 cat images. But, it fails to identify any dog images. Let us calculate the accuracy for this scenario.
Wow! We still got the same accuracy. It is 90%.
But do you think it should be 90% when our model could not identify any dog image correctly? No, right?
Therefore, when the dataset is imbalanced or skewed, we must not use accuracy to determine our model's performance.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
Top 5 FREE Python Courses on Udemy and Coursera!! π
(Plus a BONUS resource! π)
Python Programming Resources π§΅π
1β£ Introduction To Python Programming
- Number of lectures: 18
- Level: Beginner
- Link: udemy.com/course/pythonfβ¦
2β£ Programming for Everybody (Getting Started with Python)
- Course offered by University of Michigan
- Level: Beginner
- Link: coursera.org/learn/python
Have you found the concept of Decision Trees confusing? Well, this thread would be useful!
A beginner-friendly DETAILED explanation to Decision Trees ππ§΅
π What is a Decision Tree?
It is a Classification Algorithm. Decision Trees can use it to solve Regression problems as well. It is a tree or graph-like structure with a root node, non-leaf nodes, leaf nodes and branches.
See an example of Decision Tree using AND Table (Image)
Let's understand two terms first:-
What is splitting?
Splitting is dividing the nodes into two or more sub-nodes based on the outcome. In the "Yes or No" case, a node will split into two sub-nodes, one pointing towards the outcome of YES & the other towards NO.
Most of us know that Logistic Regression is a Classification Algorithm! But why does it have the word 'regression' in it?
Let me explain ππ
π Let's start by basics:-
Classification: It is a process of putting things/items into a category.
When you a fruit bowl. You can separate all the fruits by categorising them by name, type, or size.
Classification can be binary or multi-class.
Binary Classification is where there exist only two categories.
Multi-Class is where there exist more than two categories.
Some of us were looking for a place to start off our Machine Learning and Data Science Journey.
I have some resources that can help you! π
Check them outππ
1. 100 Days of Code by Ayon Roy
This repository is a gold mine for someone who is looking to begin ML. It explains to you how to go about every little topic.
Link: https://bit .ly/3wtWFwx
2. Kaggle Courses
Another phenomenal source to get started with Data Science and Machine Learning. It has topic-wise courses that can up-skill you.
The topic of TRIGGERS is a level-up from creating queries and sub-queries in the handling databases.
A super-easy guide to triggers in SQL π§΅π
βοΈ What is a trigger in the laymen (standard) language?
When someone pokes you or does something to agitate you, there is a high chance that you might get 'triggered' automatically! Right?
βοΈ Triggers in SQL
So, when we say TRIGGERS in SQL, it is simply a stored program that gets executed on its own when a triggering event occurs. Now, triggers are a part of PL/SQL.
PL/SQL is an extension of SQL where SQL queries are used and procedural statements/language.