Mehrdad Farajtabar Profile picture
Research Scientist at @Apple, ex-@DeepMind, ex-@GeorgiaTech
Oct 10 13 tweets 6 min read
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series.
arxiv.org/pdf/2410.05229

Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel.

#LLM #Reasoning #Mathematics #AGI #Research #AppleImage 2/ When OpenAI released GSM8K ~3 years ago, GPT-3 (175B) scored 35% on the GSM8K test. Today, models with ~3B parameters are surpassing 85%, and larger ones are hitting >95%. But has model 'reasoning' really improved? How much of this is genuine #logical/#symbolic reasoning? vs. #pattern_recognition, inadvertent data #contamination, or #overfitting?Image