Tweet

Sukriti Macker

6 Aug, 15 tweets, 4 min read

What is Accuracy in Machine Learning? Why shouldn't you use accuracy as a Performance Parameter? Is there a drawback? 🤔

A beginner-friendly explanation! 🧵👇

A compiled version of the thread: diveintodata.in/accuracy-perfo…

☑️ What are Performance Metrics?
To study the effectiveness of a product or device in any field, we use performance metrics. Additionally, we can use performance metrics to measure how a machine learning model performs. The performance metrics provide the results in numbers.

Therefore, it becomes pretty easy to understand and evaluate the performance of most machine learning models.

Let's say I tell you that a movie is 3 on 5. You can directly understand that the film is okay and not great.

Therefore, when someone QUANTIFIES or puts a number value to the performance of a device, product or machine learning model, you can quickly understand whether or not it performs well. In other words, you can evaluate the product quickly by seeing a number score.

☑️ ACCURACY
Accuracy is a performance metric. It is calculated as the number of correct predictions divided by the total number of predictions.

See the formula below:-

Let's say:-
There are 700 images of dogs and cats.
Out of 700, 350 images are of dogs, and 350 images are of cats.

Out of 350 dog images, the model correctly identifies 320 images
Out of 350 cat images, the model correctly identifies 310 images

Therefore, the model correctly identifies 320 as dog images and 310 as cat images. Thus, the model correctly identifies 630 images.

Now, according to the formula, the accuracy of the model is 0.9 or 91%.

IMPORTANT OBSERVATION:-
The accuracy metric works best when the dataset is balanced. It is not reliable for an imbalanced dataset.

☑️ Balanced vs Imbalanced Dataset

-> Balanced Dataset:-
The dataset is balanced when the target labels are nearly equal or equal in number in a classification problem. For example, out of the 700 images, if 350 are dog images and 350 are cat images, this is a balanced dataset.

-> Imbalanced Dataset
We say the dataset is imbalanced when the target labels are not equal and incomparable in a classification problem.
In other words, when there are more number target labels of one class than the other.

For example, out of 700 dogs and cats images, there are only 10 dog images and 690 cat images. This is an imbalanced dataset.

☑️ DRAWBACK - Why shouldn't you use accuracy as a performance metric?
Let us take the same example for the imbalanced dataset we took above. In which there are 700 images of dogs and cats. From the 700 images, there are 10 photos of dogs and 690 photos of cats.

Assume that our model correctly identifies 630 cat images. But, it fails to identify any dog images. Let us calculate the accuracy for this scenario.

Wow! We still got the same accuracy. It is 90%.
But do you think it should be 90% when our model could not identify any dog image correctly? No, right?
Therefore, when the dataset is imbalanced or skewed, we must not use accuracy to determine our model's performance.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Sukriti_Macker

Sukriti Macker

@Sukriti_Macker

8 Aug

Top 5 FREE Python Courses on Udemy and Coursera!! 😍
(Plus a BONUS resource! 🌟)

Python Programming Resources 🧵👇

1⃣ Introduction To Python Programming
- Number of lectures: 18
- Level: Beginner
- Link: udemy.com/course/pythonf…

2⃣ Programming for Everybody (Getting Started with Python)
- Course offered by University of Michigan
- Level: Beginner
- Link: coursera.org/learn/python

Read 8 tweets

Sukriti Macker

@Sukriti_Macker

26 Jul

Have you found the concept of Decision Trees confusing? Well, this thread would be useful!

A beginner-friendly DETAILED explanation to Decision Trees 👇🧵

🌟 What is a Decision Tree?
It is a Classification Algorithm. Decision Trees can use it to solve Regression problems as well. It is a tree or graph-like structure with a root node, non-leaf nodes, leaf nodes and branches.

See an example of Decision Tree using AND Table (Image)

Let's understand two terms first:-

What is splitting?
Splitting is dividing the nodes into two or more sub-nodes based on the outcome. In the "Yes or No" case, a node will split into two sub-nodes, one pointing towards the outcome of YES & the other towards NO.

Read 16 tweets

Sukriti Macker

@Sukriti_Macker

19 Jul

Most of us know that Logistic Regression is a Classification Algorithm! But why does it have the word 'regression' in it?

Let me explain 👇👇

🌟 Let's start by basics:-

Classification: It is a process of putting things/items into a category.
When you a fruit bowl. You can separate all the fruits by categorising them by name, type, or size.

Classification can be binary or multi-class.

Binary Classification is where there exist only two categories.
Multi-Class is where there exist more than two categories.

(See the image below)

Read 8 tweets

Sukriti Macker

@Sukriti_Macker

12 Jul

Some of us were looking for a place to start off our Machine Learning and Data Science Journey.

I have some resources that can help you! 😁

Check them out👇👇

1. 100 Days of Code by Ayon Roy
This repository is a gold mine for someone who is looking to begin ML. It explains to you how to go about every little topic.

Link: https://bit .ly/3wtWFwx

2. Kaggle Courses
Another phenomenal source to get started with Data Science and Machine Learning. It has topic-wise courses that can up-skill you.

Link: https://www.kaggle .com/learn

Read 10 tweets

Sukriti Macker

@Sukriti_Macker

9 Jul

Do you know about the SLICING technique that helps you work around strings in Python??

To handle sub-strings, you can use the SLICING method.
👇👇

Let's say we have a string.

a = "This is a string."

You can slice between the string!

👉 a[start_index : end_index : stride]

Terms:-
start_index -> Takes an integer value; this is the index value (inclusive) from where you want to start slicing.

end_index -> Takes an integer value; this is the index value (exclusive) where you want the slicing to end.

stride -> Takes an integer value; indicates how many characters to skip after the first character is retrieved.

Note: The first character index is always 0.

Read 8 tweets

Sukriti Macker

@Sukriti_Macker

7 Jul

The topic of TRIGGERS is a level-up from creating queries and sub-queries in the handling databases.

A super-easy guide to triggers in SQL 🧵👇

☑️ What is a trigger in the laymen (standard) language?

When someone pokes you or does something to agitate you, there is a high chance that you might get 'triggered' automatically! Right?

☑️ Triggers in SQL

So, when we say TRIGGERS in SQL, it is simply a stored program that gets executed on its own when a triggering event occurs. Now, triggers are a part of PL/SQL.

PL/SQL is an extension of SQL where SQL queries are used and procedural statements/language.

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!