Can your encoder-decoder model generate a database-like table? We intended to do it efficiently and ended up with the STable🐴framework applicable to problems such as extraction of line items or joint entity and relation extraction.

See arxiv.org/abs/2206.04045 and 🧵
#NLProc
From receipts and invoices, through paycheck stubs and insurance loss run reports, to scientific articles, real-world documents contain explicitly or implicitly tabular data to be extracted. These are not necessarily represented as a table per se within the input document.
At the same time, encoder-decoder models unify a variety of NLP problems by casting them as QA with a plain-text answer. We argue that the restriction of output type to raw text is sometimes suboptimal and propose a framework able to infer a list of ordered tuples or a table.
Though the transformer was formerly used to infer lists or even more complex structures, it was often achieved in an autoregressive manner — a model intended for the generation of *unstructured* text in natural language was used to infer an output with formal *structure*.
In contrast, we exploit regularities and relationships within the output data and propose the training that maximizes the expected log-likelihood for a table's content across all random permutations of the factorization order.
During the content inference, we exploit the model's ability to generate cells in any order by searching over possible orderings to maximize the model's confidence and avoid substantial error accumulation, which other sequential models are prone to.
Remember our TILT model? With the STable framework, we could further improve its performance on public and real-world business datasets by 15%.
Curious about the details of the method developed by Michał Pietruszka, Michał Turski, Tomasz Dwojak, Gabriela Pałka, Karolina Szyndler, Dawid Jurkiewicz, Łukasz Garncarek and me? See the recent arXiv drop at arxiv.org/abs/2206.04045

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Łukasz Borchmann

Łukasz Borchmann Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(