Tweet

Sukriti Macker

26 Jul, 16 tweets, 4 min read

Have you found the concept of Decision Trees confusing? Well, this thread would be useful!

A beginner-friendly DETAILED explanation to Decision Trees 👇🧵

🌟 What is a Decision Tree?
It is a Classification Algorithm. Decision Trees can use it to solve Regression problems as well. It is a tree or graph-like structure with a root node, non-leaf nodes, leaf nodes and branches.

See an example of Decision Tree using AND Table (Image)

Let's understand two terms first:-

What is splitting?
Splitting is dividing the nodes into two or more sub-nodes based on the outcome. In the "Yes or No" case, a node will split into two sub-nodes, one pointing towards the outcome of YES & the other towards NO.

What is impurity?
When a node doesn't have a clear outcome or doesn't belong to a single class, whether it be Yes Class or No Class, the node is impure.
If the node has all the values of a single class, it is called a pure node.

How to select which feature to do a split?
For that, we calculate the impurity. We want to select the feature which gives us the lowest impurity.
Lowest impurity means that the split will result in producing "less" impure nodes.

We aim to reach pure nodes, where each node belongs to a single class.

-> How do we calculate impurity?
The most common method is GINI INDEX. It is used for BINARY SPLIT (splitting in yes or no).
Other measures include:-
Accuracy
Information Gain
Gain Ratio

🌟 Formula to calculate GINI INDEX

GI = 1 - square of(Probability of "Yes") - square of(Probability of "No)

🌟 Let's take a table to understand the working of the decision tree:-
We have a table with Independent Variables such as Chest Pain, Good Blood Circulation & Blocked Arteries, determining the Heart Disease variable, a dependent variable.

Refer to the image

In the data above, we have named four features: Chest Pain, Good Blood Circulation & Blocked Arteries, which we can make our root node.

To decide which node to make the root, we calculate the Gini Index and pick the lowest impurity feature.

Let's say,

If we pick Chest Pain:-
If Chest Pain YES -> 105 People have Heart Disease, and 39 don't have Heart Disease.
If Chest Pain NO -> 34 People have Heart Disease, and 125 don't have Heart Disease.

For other features, see the image:-

Gini Index for when Chest Pain is YES
= 1 - sq of(Prob of Heart Disease Yes) - sq of(Prob of Heart Disease No)
= 1 - sq of(105/105+39) - sq of(39/105+39)
= 0.395

Gini Index for when Chest Pain is No = 0.336

Now, we calculate the Total Gini Index for Yes & No Chest Pain.

🌟 Total G.I. = Weighted average of Gini Impurities for leaf nodes

Total G.I. for Chest Pain Node = [(Chest Pain Yes/Total Chest Pain Patients) x G.I ofChest Pain Yes] + [(Chest Pain No/Total Chest Pain Patients) x G.I ofChest Pain No]

= [(144/144+159) x 0.395] + [(159/144+159) x 0.336)]

= 0.364

After calculating the Gini Index of each feature like this, we would find that the Good Blood Circulation feature has the lowest impurity. Thus, we select that feature to be our root node.

Similarly, we continue the process to find the next node and complete our Decision Tree.

If you liked the explanation of A DECISION TREE as a Machine Learning Algorithm, consider following me, like this tweet and retweet it as well.

Thank you :)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Sukriti_Macker

Sukriti Macker

@Sukriti_Macker

19 Jul

Most of us know that Logistic Regression is a Classification Algorithm! But why does it have the word 'regression' in it?

Let me explain 👇👇

🌟 Let's start by basics:-

Classification: It is a process of putting things/items into a category.
When you a fruit bowl. You can separate all the fruits by categorising them by name, type, or size.

Classification can be binary or multi-class.

Binary Classification is where there exist only two categories.
Multi-Class is where there exist more than two categories.

(See the image below)

Read 8 tweets

Sukriti Macker

@Sukriti_Macker

12 Jul

Some of us were looking for a place to start off our Machine Learning and Data Science Journey.

I have some resources that can help you! 😁

Check them out👇👇

1. 100 Days of Code by Ayon Roy
This repository is a gold mine for someone who is looking to begin ML. It explains to you how to go about every little topic.

Link: https://bit .ly/3wtWFwx

2. Kaggle Courses
Another phenomenal source to get started with Data Science and Machine Learning. It has topic-wise courses that can up-skill you.

Link: https://www.kaggle .com/learn

Read 10 tweets

Sukriti Macker

@Sukriti_Macker

9 Jul

Do you know about the SLICING technique that helps you work around strings in Python??

To handle sub-strings, you can use the SLICING method.
👇👇

Let's say we have a string.

a = "This is a string."

You can slice between the string!

👉 a[start_index : end_index : stride]

Terms:-
start_index -> Takes an integer value; this is the index value (inclusive) from where you want to start slicing.

end_index -> Takes an integer value; this is the index value (exclusive) where you want the slicing to end.

stride -> Takes an integer value; indicates how many characters to skip after the first character is retrieved.

Note: The first character index is always 0.

Read 8 tweets

Sukriti Macker

@Sukriti_Macker

7 Jul

The topic of TRIGGERS is a level-up from creating queries and sub-queries in the handling databases.

A super-easy guide to triggers in SQL 🧵👇

☑️ What is a trigger in the laymen (standard) language?

When someone pokes you or does something to agitate you, there is a high chance that you might get 'triggered' automatically! Right?

☑️ Triggers in SQL

So, when we say TRIGGERS in SQL, it is simply a stored program that gets executed on its own when a triggering event occurs. Now, triggers are a part of PL/SQL.

PL/SQL is an extension of SQL where SQL queries are used and procedural statements/language.

Read 14 tweets

Sukriti Macker

@Sukriti_Macker

2 Jul

STRINGS - Handling them like a PRO! 😎

Some in-built functions that can make your task of working around strings a cakewalk!

Thread 🧵

☑️ What are STRINGS?
In Python, there is no character! Even a single character is considered a string. Strings are an immutable sequence.

Immutable -> Once strings are created, they cannot be changed.
So, you can't change the string, but the reference to it can be changed.

☑️ .split()
a = "this is a string"
a.split()

O/P: ['this', 'is', 'a', 'string']

The string is split into each word in the sentence, considering the DELIMITER as space. The output of the split function is given in a LIST.

Read 9 tweets

Sukriti Macker

@Sukriti_Macker

29 Jun

Hey, here is a compiled version of how to proceed with SQL in a non-threatening way. :)

SQL is fun, easy to grasp and can kick start your programming journey. Learn something for fun and satiate your curiosity. 📚

A Beginner-friendly guide to begin your SQL Journey! 🧵👇

https://twitter.com/Sukriti_Macker/status/1400798388132757508?s=20

https://twitter.com/Sukriti_Macker/status/1400798388132757508?s=20

https://twitter.com/Sukriti_Macker/status/1401161032505118725?s=20

https://twitter.com/Sukriti_Macker/status/1401161032505118725?s=20

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Sukriti Macker

Try unrolling a thread yourself!

More from @Sukriti_Macker

Sukriti Macker

Sukriti Macker

Sukriti Macker

Sukriti Macker

Sukriti Macker

Sukriti Macker

Did Thread Reader help you today?

Like this author's thread?