Vadim Borisov Profile picture
Oct 21, 2022 8 tweets 4 min read Read on X
Leave VAEs and GANs behind: LLMs are all you need for tabular data generation!
We introduce a new method GReaT (Generation of Realistic Tabular data), with state-of-the-art generative abilities (see below). How we did it? ↓ (1/n)
#tabulardata Image
(2/n) Tabular data frequently consists of categorical and numerical data. Furthermore, categorical data and feature names typically are words. Thus, it is possible to represent a tabular data sample as a meaningful sentence, e.g., "Age is 42, Education is HS-grad, ..." Image
(3/n) ... where feature name and value are used together. After this step, we can fine-tune a pre-trained large language model (LLM) on obtained sentences, and use the LLM to synthesise new data samples! Image
(4/n) While most of the methods for tabular data generation are primarily based on VAEs or GANs, our approach utilized the pre-training power of the Transformer-based language models. We also demonstrate that pre-training benefits the generative quality of synthetic data. Image
(5/n) Try GReaT today! We have an easy-to-use developed a Python package be-great, which can be installed it with pip:
> pip install be-great

Code is available on GitHub: github.com/kathrinse/be_g…

Here is a example for training and sampling for California housing dataset: Image
(6/n) For more details please refer to our preprint:
arxiv.org/abs/2210.06280
(7/n) I would like to say thank you to all contributors @t_leemann, @MartinPawelczyk, @Gjergji_ . Especially, I thank Kathrin Seßler, this project wouldn't exist without her.
(8/n) Lastly, if you want to know more about tabular data and deep neural networks, you should definitely check our survey. Here is the link: arxiv.org/abs/2110.01889

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vadim Borisov

Vadim Borisov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(