Have you found the concept of Decision Trees confusing? Well, this thread would be useful!
A beginner-friendly DETAILED explanation to Decision Trees ๐๐งต
๐ What is a Decision Tree?
It is a Classification Algorithm. Decision Trees can use it to solve Regression problems as well. It is a tree or graph-like structure with a root node, non-leaf nodes, leaf nodes and branches.
See an example of Decision Tree using AND Table (Image)
Let's understand two terms first:-
What is splitting?
Splitting is dividing the nodes into two or more sub-nodes based on the outcome. In the "Yes or No" case, a node will split into two sub-nodes, one pointing towards the outcome of YES & the other towards NO.
What is impurity?
When a node doesn't have a clear outcome or doesn't belong to a single class, whether it be Yes Class or No Class, the node is impure.
If the node has all the values of a single class, it is called a pure node.
How to select which feature to do a split?
For that, we calculate the impurity. We want to select the feature which gives us the lowest impurity.
Lowest impurity means that the split will result in producing "less" impure nodes.
We aim to reach pure nodes, where each node belongs to a single class.
-> How do we calculate impurity?
The most common method is GINI INDEX. It is used for BINARY SPLIT (splitting in yes or no).
Other measures include:-
Accuracy
Information Gain
Gain Ratio
๐ Formula to calculate GINI INDEX
GI = 1 - square of(Probability of "Yes") - square of(Probability of "No)
๐ Let's take a table to understand the working of the decision tree:-
We have a table with Independent Variables such as Chest Pain, Good Blood Circulation & Blocked Arteries, determining the Heart Disease variable, a dependent variable.
Refer to the image
In the data above, we have named four features: Chest Pain, Good Blood Circulation & Blocked Arteries, which we can make our root node.
To decide which node to make the root, we calculate the Gini Index and pick the lowest impurity feature.
Let's say,
If we pick Chest Pain:-
If Chest Pain YES -> 105 People have Heart Disease, and 39 don't have Heart Disease.
If Chest Pain NO -> 34 People have Heart Disease, and 125 don't have Heart Disease.
For other features, see the image:-
Gini Index for when Chest Pain is YES
= 1 - sq of(Prob of Heart Disease Yes) - sq of(Prob of Heart Disease No)
= 1 - sq of(105/105+39) - sq of(39/105+39)
= 0.395
Gini Index for when Chest Pain is No = 0.336
Now, we calculate the Total Gini Index for Yes & No Chest Pain.
๐ Total G.I. = Weighted average of Gini Impurities for leaf nodes
Total G.I. for Chest Pain Node = [(Chest Pain Yes/Total Chest Pain Patients) x G.I ofChest Pain Yes] + [(Chest Pain No/Total Chest Pain Patients) x G.I ofChest Pain No]
= [(144/144+159) x 0.395] + [(159/144+159) x 0.336)]
= 0.364
After calculating the Gini Index of each feature like this, we would find that the Good Blood Circulation feature has the lowest impurity. Thus, we select that feature to be our root node.
Similarly, we continue the process to find the next node and complete our Decision Tree.
If you liked the explanation of A DECISION TREE as a Machine Learning Algorithm, consider following me, like this tweet and retweet it as well.
Thank you :)
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
Most of us know that Logistic Regression is a Classification Algorithm! But why does it have the word 'regression' in it?
Let me explain ๐๐
๐ Let's start by basics:-
Classification: It is a process of putting things/items into a category.
When you a fruit bowl. You can separate all the fruits by categorising them by name, type, or size.
Classification can be binary or multi-class.
Binary Classification is where there exist only two categories.
Multi-Class is where there exist more than two categories.
Some of us were looking for a place to start off our Machine Learning and Data Science Journey.
I have some resources that can help you! ๐
Check them out๐๐
1. 100 Days of Code by Ayon Roy
This repository is a gold mine for someone who is looking to begin ML. It explains to you how to go about every little topic.
Link: https://bit .ly/3wtWFwx
2. Kaggle Courses
Another phenomenal source to get started with Data Science and Machine Learning. It has topic-wise courses that can up-skill you.
The topic of TRIGGERS is a level-up from creating queries and sub-queries in the handling databases.
A super-easy guide to triggers in SQL ๐งต๐
โ๏ธ What is a trigger in the laymen (standard) language?
When someone pokes you or does something to agitate you, there is a high chance that you might get 'triggered' automatically! Right?
โ๏ธ Triggers in SQL
So, when we say TRIGGERS in SQL, it is simply a stored program that gets executed on its own when a triggering event occurs. Now, triggers are a part of PL/SQL.
PL/SQL is an extension of SQL where SQL queries are used and procedural statements/language.