Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @rasbt

Sebastian Raschka

@rasbt

Sep 7

Going down some deep rabbit holes here and learning new things ...
Seems like a successful Kaggle strategy is randomly swapping cols in a tabular dataset (~like mix-up, but w/o including the labels).
Anyone tried this for a serious project with a non-deep learning tabular algo?

Link to the code (here used in a competition-winning deep learning for tabular data method as part of a denoising autoencoder backbone): kaggle.com/code/danofer/s…

It's worth clarifying (since the original tweet above looks misleading) that it doesn't literally swap columns but row values of a column. E.g., if you have two batches you exchange row data among the same columns. Probably easier to see in a figure:

https://twitter.com/rasbt/status/1567530072147464193?s=20&t=kSPKxo9cpIAReJxMQlrdtw

Read 4 tweets

Sebastian Raschka

@rasbt

Sep 2

My top 5 basic checks when training deep learning models

1) Make sure training loss converged
2) Check for overfitting
3) Compare accuracy to a zero-rule baseline
4) Look at failure cases
5) Plot at a confusion matrix

6) <fill in the blank; what's your fav I am missing?>

[1/7]

[2/7]

1) Making sure training loss converged
=> that's a classic. We typically want to see that the loss plateaus
(Left: bad; right: better)

[3/7]

2) Check for overfitting

Another classic. We typically don't want the gap between training and validation accuracy to be too large (left: bad, right: better)

Read 7 tweets

Sebastian Raschka

@rasbt

Aug 31

https://twitter.com/rasbt/status/1551252319689400324

Just added another paper to the "tabular deep learning" list -- intend to keep it up to date, and I previously missed DANets! DANets are centered around finding and grouping correlated features [right] -- something we usually would do manually [left]. How does that work? [1/9]

https://twitter.com/rasbt/status/1551252319689400324

[2/9] The main idea behind DANets is to introduce an Abstract Layer (ABSTLAY) building block; multiple such blocks are then stacked to form a DANet. What does ABSTLAY do? It performs two steps: 1) feature selection and 2) feature abstraction.

[3/9] The feature selection (step 1) groups correlated features using a sparse learnable mask (they use Entmax -- analogous to Softmax but sparsity inducing). The feature abstraction (step 2) is using a fully connected layer with attention (not shown) on the selected features.

Read 10 tweets

Sebastian Raschka

@rasbt

Aug 28

In practice, a trained machine learning model is never final -- concept drift will inevitably cause a performance decline of a production model over time. [1/10]

[2/10] There are two main flavors of concept drift: feature drift and "real" concept drift.
There's an excellent article here that illustrates this in more detail: concept-drift.fastforwardlabs.com

[3/10] In a nutshell, feature drift describes the change in the input feature distribution over time. In rare cases, this is not harmful (subpanel to the right), but in most cases, it will require retraining the model (subpanel in the center)

Read 10 tweets

Sebastian Raschka

@rasbt

Aug 23

Random Forest is my favorite baseline algorithm (alongside Logistic Regression). It’s great because it can handle nonlinear problems and has good out-of-the-box performance (RF doesn’t require tuning). But …
[1/5]

[2/5] … even though RF performs an implicit feat sele (via the splitting criterion at each node), it's not immune to irrelevant features.
Here's a nice discussion & investigation by Gertjan Verhoeven: gsverhoeven.github.io/post/random-fo….
Perf decreases after adding 100 & 500 noise feats:

[3/5] According to "Hyperparameters and Tuning Strategies for Random Forest" (Probst, Wright, Boulesteix 2019), the number of trees at each node (here called "mtry") is the most influential hyperparameter for random forests. Increasing it improves perf in the presence of noise:

Read 7 tweets

Sebastian Raschka

@rasbt

Jul 23

And the deep learning vs conventional machine learning for tabular data continues!
A new paper looks at 45 mid-sized datasets (10k examples) and finds that tree-based models (XGBoost & random forests) still outperform deep neural networks on tabular datasets. [1/6]

[2/6] The plot above also nicely highlights one of my favorite points when talking to collaborators: If you use RF, you will often get good out-of-the-box performance! I am positively surprised that this is nowadays true for XGBoost as well!

[3/6] The authors also looked at both numerical and mixed numerical & categorical datasets (categorical features were one-hot encoded). The results hold: tree-based methods perform well

Read 6 tweets

Share this page!

Sebastian Raschka

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @rasbt

Sebastian Raschka

Sebastian Raschka

Sebastian Raschka

Sebastian Raschka

Sebastian Raschka

Sebastian Raschka

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?