Thread by @EchoShao8899 on Thread Reader App

Can we teach LLMs to write long articles from scratch, grounded in trustworthy sources?

Do Wikipedia editors think this can assist them?

📣Announcing STORM, a system that writes Wikipedia-like articles based on Internet search. I now use STORM in my daily research!🧵

Generating long articles with citations is hard to do & hard to evaluate!

We break this problem down into two steps:

1️⃣Pre-writing, in which the system collects references and generates an outline.
2️⃣Writing, in which the system generates the final article with citations.

“Pre-writing” requires researching a topic from scratch.

That makes it hard even for expert humans. And directly prompting the LM to generate questions doesn’t work well! The questions lack depth and have limited breadth.

STORM is designed to teach LMs to *ask good questions*.

STORM improves question asking by automatically discovering perspectives for researching the topic and adding the perspective in the prompt. It also simulates information-seeking conversations to encourage follow-up questions which are usually more in-depth.

We build FreshWiki to mitigate data leakage into LM training data for evaluation.

To measure quality, we introduce heading soft recall and heading entity recall. Outline eval makes it easier to prototype methods for pre-writing.

STORM outperforms well-designed RAG baselines!

In the final writing stage, STORM generates text with citations and writes the full article section by section.

Articles produced by STORM are favored by both automatic metrics *and* experienced Wikipedia editors!

Such expository writing should always be grounded.

We assess citation quality and ask Wikipedia editors to rate verifiability. We find the major challenge stems from red herring rather than widely discussed factual hallucination.

This calls for research beyond fact-checking!

We also ask Wikipedia editors for the perceived usefulness of STORM. It’s exciting that all participants agree that STORM is helpful for their pre-writing stage. Also, I use STORM myself to learn concepts in-depth in my research 😎(check out our demo video if you haven’t).

It’s worth mentioning that STORM is a carefully designed pipeline for knowledge curation rather than a single prompt or model.

We build STORM using DSPy which provides very neat modularization - this allows us to keep extending our work without getting lost in many prompt files.

We are working on making the demo public to let more people try out STORM. Stay tuned!

Read our Arxiv paper to learn more:

Thanks @_Yucheng_Jiang , Theo, Peter, @lateinteraction , and @MonicaSLam for the amazing collaboration!!arxiv.org/abs/2402.14207

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll