Latest Twitter Threads by @yasaman_razeghi on Thread Reader App

Feb 19, 2022 • 9 tweets • 4 min read

You've probably seen results showing impressive few-shot performance of very large language models (LLMs). Do those results mean that LLMs can reason? Well, maybe, but maybe not. Few-shot performance is highly correlated with pretraining term frequency. arxiv.org/abs/2202.07206 We focus on numerical reasoning (addition, multiplication, and unit conversion). We use the same formats and tasks used previously to show impressive few-shot performance, but we systematically evaluate every number and correlate performance with pretraining term frequency.

Share this page!

Enter URL or ID to Unroll