Vadim Borisov Profile picture
Ph.D. student at the University of Tübingen, Germany @uni_tue Research: Deep Neural Networks and Tabular Data, Explainable Machine Learning

Oct 21, 2022, 8 tweets

Leave VAEs and GANs behind: LLMs are all you need for tabular data generation!
We introduce a new method GReaT (Generation of Realistic Tabular data), with state-of-the-art generative abilities (see below). How we did it? ↓ (1/n)
#tabulardata

(2/n) Tabular data frequently consists of categorical and numerical data. Furthermore, categorical data and feature names typically are words. Thus, it is possible to represent a tabular data sample as a meaningful sentence, e.g., "Age is 42, Education is HS-grad, ..."

(3/n) ... where feature name and value are used together. After this step, we can fine-tune a pre-trained large language model (LLM) on obtained sentences, and use the LLM to synthesise new data samples!

(4/n) While most of the methods for tabular data generation are primarily based on VAEs or GANs, our approach utilized the pre-training power of the Transformer-based language models. We also demonstrate that pre-training benefits the generative quality of synthetic data.

(5/n) Try GReaT today! We have an easy-to-use developed a Python package be-great, which can be installed it with pip:
> pip install be-great

Code is available on GitHub: github.com/kathrinse/be_g…

Here is a example for training and sampling for California housing dataset:

(6/n) For more details please refer to our preprint:
arxiv.org/abs/2210.06280

(7/n) I would like to say thank you to all contributors @t_leemann, @MartinPawelczyk, @Gjergji_ . Especially, I thank Kathrin Seßler, this project wouldn't exist without her.

(8/n) Lastly, if you want to know more about tabular data and deep neural networks, you should definitely check our survey. Here is the link: arxiv.org/abs/2110.01889

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling