Yasaman Razeghi Profile picture
PhD student in UC Irvine, researching on NLP/ML
Feb 19, 2022 9 tweets 4 min read
You've probably seen results showing impressive few-shot performance of very large language models (LLMs). Do those results mean that LLMs can reason? Well, maybe, but maybe not. Few-shot performance is highly correlated with pretraining term frequency. arxiv.org/abs/2202.07206 We focus on numerical reasoning (addition, multiplication, and unit conversion). We use the same formats and tasks used previously to show impressive few-shot performance, but we systematically evaluate every number and correlate performance with pretraining term frequency.