I've been trying the new TensorFlow Decision Forest (TF-DF) library today and it's very good!
Not only the ease of use but also all the available metadata, documentation and integrations you get!
Let me show you some of the cool things I've learned so far...
[5 min]
1/11🧵
TensorFlow Decision Forests have implemented 3 algorithms:
• CART
• Random Forest
• Gradient Boosted Trees
You can get this list with tfdf.keras.get_all_models()
All of them enable Classification, Regression and Ranking Tasks
2/11🧵
CART or Classification and Regression Trees is a simple decision tree. 🌳
The process divides the dataset in two parts
The first is used to grow the tree while the second is used to prune the tree
This is a good basic algorithm to learn and understand Decision Trees
3/11🧵
Random Forest is the most well-known Decision Forest algorithm. 🌳🌲🌴🎄
It is a collection of CART trees trained on random subsets (with replacement) of the original data
+ It's robust to overfitting
4/11🧵
Gradient Boosted Trees is a set of shallow decision trees trained sequentially. 🌱🌿🌳
The idea is that each tree predicts the gradient of the loss of the model.
+ It usually outperforms Random Forests 👍
5/11🧵
All the implemented algorithms can deal with Numerical and Categorical data without preprocessing.
If you want, you can do it using the Keras Preprocessing or a Pandas Dataframe
The advantage of the Keras solution is that these are saved as part of the final model
6/11🧵
TF-DF makes it easy to plot the trained tree.
As seen in the image:
• Colors are the label distribution
• There's the "if" for the decision per node
• The deeper the node the more pure they become
*This image is better on colab as hovering on nodes shows more info!
7/11🧵
TF-DF also integrates with TensorBoard!
You'll need 3 lines of code to do so! 🤯
8/11🧵
TF-DF can work with other TF models/layers, for example, you can load an language embedding model from #TFHub and use the model as a preprocessing to your Decision tree
9/11🧵
There are many other nice features for inspecting and debugging the model and you can learn more about it on this tutorial notebook
Machine learning goes beyond Deep Learning and Neural Networks
Sometimes a simpler technique might give you better results and be easier to understand
A very versatile algorithm is the Decision Forest
🌴🌲🌳?
What is it and how does it work?
Let me tell you..
[7 min]
1/10🧵
Before understanding a Forest, let's start by what's a Tree
Imagine you have a table of data of characteristics of Felines. With features like size, weight, color, habitat and a column with the labels like lion, tiger, house cat, lynx and so on.
2/10🧵
With some time, you could write a code based on if/else statements that could, for each a row in the table, decide which feline it is
This is exactly what a Decision Tree does
During its training it creates the if/elses
TensorFlow Hub has models for all the ML domains such as Image, Text, Audio and Video
For image, this tutorial can get you started with Transfer Learning. It does some cool tricks with the data and the final model is ready for on-device deployment
Sometimes you need to build a Machine Learning model that cannot be expressed with the Sequential API
For these moments, when you need a more complex model, with multiple inputs and outputs or with residual connections, that's when you need the Functional API!
[2.46 min]
1/8🧵
The Functional API is more flexible than the Sequential API.
The easiest way to understand is to visualize the same model created using the Sequential and Functional API
2/8🧵
You can think of the Functional API as a way to create a Directed Acyclic Graph (DAG) of layers while the Sequential API can only create a stack of layers.
Functional is also known as Symbolic or Declarative API