Have you found the concept of Decision Trees confusing? Well, this thread would be useful!

A beginner-friendly DETAILED explanation to Decision Trees ๐Ÿ‘‡๐Ÿงต Decision Trees Concept in Machine Learning Algorithm
๐ŸŒŸ What is a Decision Tree?
It is a Classification Algorithm. Decision Trees can use it to solve Regression problems as well. It is a tree or graph-like structure with a root node, non-leaf nodes, leaf nodes and branches.

See an example of Decision Tree using AND Table (Image) Example of a Decision Tree with all the components
Let's understand two terms first:-

What is splitting?
Splitting is dividing the nodes into two or more sub-nodes based on the outcome. In the "Yes or No" case, a node will split into two sub-nodes, one pointing towards the outcome of YES & the other towards NO.
What is impurity?
When a node doesn't have a clear outcome or doesn't belong to a single class, whether it be Yes Class or No Class, the node is impure.
If the node has all the values of a single class, it is called a pure node.
How to select which feature to do a split?
For that, we calculate the impurity. We want to select the feature which gives us the lowest impurity.
Lowest impurity means that the split will result in producing "less" impure nodes.
We aim to reach pure nodes, where each node belongs to a single class.
-> How do we calculate impurity?
The most common method is GINI INDEX. It is used for BINARY SPLIT (splitting in yes or no).
Other measures include:-
Accuracy
Information Gain
Gain Ratio
๐ŸŒŸ Formula to calculate GINI INDEX

GI = 1 - square of(Probability of "Yes") - square of(Probability of "No) Gini Index Formula for Decision Trees
๐ŸŒŸ Let's take a table to understand the working of the decision tree:-
We have a table with Independent Variables such as Chest Pain, Good Blood Circulation & Blocked Arteries, determining the Heart Disease variable, a dependent variable.

Refer to the image Example of Decision Table Data
In the data above, we have named four features: Chest Pain, Good Blood Circulation & Blocked Arteries, which we can make our root node.

To decide which node to make the root, we calculate the Gini Index and pick the lowest impurity feature.
Let's say,

If we pick Chest Pain:-
If Chest Pain YES -> 105 People have Heart Disease, and 39 don't have Heart Disease.
If Chest Pain NO -> 34 People have Heart Disease, and 125 don't have Heart Disease.

For other features, see the image:- Which feature to pick as root node for decision tree?
Gini Index for when Chest Pain is YES
= 1 - sq of(Prob of Heart Disease Yes) - sq of(Prob of Heart Disease No)
= 1 - sq of(105/105+39) - sq of(39/105+39)
= 0.395

Gini Index for when Chest Pain is No = 0.336

Now, we calculate the Total Gini Index for Yes & No Chest Pain.
๐ŸŒŸ Total G.I. = Weighted average of Gini Impurities for leaf nodes Total Gini Index Formula for Decision Trees
Total G.I. for Chest Pain Node = [(Chest Pain Yes/Total Chest Pain Patients) x G.I ofChest Pain Yes] + [(Chest Pain No/Total Chest Pain Patients) x G.I ofChest Pain No]

= [(144/144+159) x 0.395] + [(159/144+159) x 0.336)]

= 0.364
After calculating the Gini Index of each feature like this, we would find that the Good Blood Circulation feature has the lowest impurity. Thus, we select that feature to be our root node.

Similarly, we continue the process to find the next node and complete our Decision Tree.
If you liked the explanation of A DECISION TREE as a Machine Learning Algorithm, consider following me, like this tweet and retweet it as well.

Thank you :)

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Sukriti Macker

Sukriti Macker Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Sukriti_Macker

19 Jul
Most of us know that Logistic Regression is a Classification Algorithm! But why does it have the word 'regression' in it?

Let me explain ๐Ÿ‘‡๐Ÿ‘‡
๐ŸŒŸ Let's start by basics:-

Classification: It is a process of putting things/items into a category.
When you a fruit bowl. You can separate all the fruits by categorising them by name, type, or size.
Classification can be binary or multi-class.

Binary Classification is where there exist only two categories.
Multi-Class is where there exist more than two categories.

(See the image below) Binary Classification Example
Read 8 tweets
12 Jul
Some of us were looking for a place to start off our Machine Learning and Data Science Journey.

I have some resources that can help you! ๐Ÿ˜

Check them out๐Ÿ‘‡๐Ÿ‘‡
1. 100 Days of Code by Ayon Roy
This repository is a gold mine for someone who is looking to begin ML. It explains to you how to go about every little topic.

Link: https://bit .ly/3wtWFwx
2. Kaggle Courses
Another phenomenal source to get started with Data Science and Machine Learning. It has topic-wise courses that can up-skill you.

Link: https://www.kaggle .com/learn
Read 10 tweets
9 Jul
Do you know about the SLICING technique that helps you work around strings in Python??

To handle sub-strings, you can use the SLICING method.
๐Ÿ‘‡๐Ÿ‘‡
Let's say we have a string.

a = "This is a string."

You can slice between the string!

๐Ÿ‘‰ a[start_index : end_index : stride]

Terms:-
start_index -> Takes an integer value; this is the index value (inclusive) from where you want to start slicing.
end_index -> Takes an integer value; this is the index value (exclusive) where you want the slicing to end.

stride -> Takes an integer value; indicates how many characters to skip after the first character is retrieved.

Note: The first character index is always 0.
Read 8 tweets
7 Jul
The topic of TRIGGERS is a level-up from creating queries and sub-queries in the handling databases.

A super-easy guide to triggers in SQL ๐Ÿงต๐Ÿ‘‡
โ˜‘๏ธ What is a trigger in the laymen (standard) language?

When someone pokes you or does something to agitate you, there is a high chance that you might get 'triggered' automatically! Right?
โ˜‘๏ธ Triggers in SQL

So, when we say TRIGGERS in SQL, it is simply a stored program that gets executed on its own when a triggering event occurs. Now, triggers are a part of PL/SQL.

PL/SQL is an extension of SQL where SQL queries are used and procedural statements/language.
Read 14 tweets
2 Jul
STRINGS - Handling them like a PRO! ๐Ÿ˜Ž

Some in-built functions that can make your task of working around strings a cakewalk!

Thread ๐Ÿงต
โ˜‘๏ธ What are STRINGS?
In Python, there is no character! Even a single character is considered a string. Strings are an immutable sequence.

Immutable -> Once strings are created, they cannot be changed.
So, you can't change the string, but the reference to it can be changed.
โ˜‘๏ธ .split()
a = "this is a string"
a.split()

O/P: ['this', 'is', 'a', 'string']

The string is split into each word in the sentence, considering the DELIMITER as space. The output of the split function is given in a LIST.
Read 9 tweets
29 Jun
Hey, here is a compiled version of how to proceed with SQL in a non-threatening way. :)

SQL is fun, easy to grasp and can kick start your programming journey. Learn something for fun and satiate your curiosity. ๐Ÿ“š

A Beginner-friendly guide to begin your SQL Journey! ๐Ÿงต๐Ÿ‘‡
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(