Yijia Shao Profile picture
Feb 26 10 tweets 4 min read Read on X
Can we teach LLMs to write long articles from scratch, grounded in trustworthy sources?

Do Wikipedia editors think this can assist them?

📣Announcing STORM, a system that writes Wikipedia-like articles based on Internet search. I now use STORM in my daily research!🧵
Generating long articles with citations is hard to do & hard to evaluate!

We break this problem down into two steps:

1️⃣Pre-writing, in which the system collects references and generates an outline.
2️⃣Writing, in which the system generates the final article with citations. Image
“Pre-writing” requires researching a topic from scratch.

That makes it hard even for expert humans. And directly prompting the LM to generate questions doesn’t work well! The questions lack depth and have limited breadth.

STORM is designed to teach LMs to *ask good questions*. Image
STORM improves question asking by automatically discovering perspectives for researching the topic and adding the perspective in the prompt. It also simulates information-seeking conversations to encourage follow-up questions which are usually more in-depth. Image
We build FreshWiki to mitigate data leakage into LM training data for evaluation.

To measure quality, we introduce heading soft recall and heading entity recall. Outline eval makes it easier to prototype methods for pre-writing.

STORM outperforms well-designed RAG baselines! Image
In the final writing stage, STORM generates text with citations and writes the full article section by section.

Articles produced by STORM are favored by both automatic metrics *and* experienced Wikipedia editors! Image
Such expository writing should always be grounded.

We assess citation quality and ask Wikipedia editors to rate verifiability. We find the major challenge stems from red herring rather than widely discussed factual hallucination.

This calls for research beyond fact-checking! Image
We also ask Wikipedia editors for the perceived usefulness of STORM. It’s exciting that all participants agree that STORM is helpful for their pre-writing stage. Also, I use STORM myself to learn concepts in-depth in my research 😎(check out our demo video if you haven’t). Image
It’s worth mentioning that STORM is a carefully designed pipeline for knowledge curation rather than a single prompt or model.

We build STORM using DSPy which provides very neat modularization - this allows us to keep extending our work without getting lost in many prompt files.
We are working on making the demo public to let more people try out STORM. Stay tuned!

Read our Arxiv paper to learn more:

Thanks @_Yucheng_Jiang , Theo, Peter, @lateinteraction , and @MonicaSLam for the amazing collaboration!!arxiv.org/abs/2402.14207

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yijia Shao

Yijia Shao Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(