How to get URL link on X (Twitter) App
https://x.com/rao2z/status/1740692722099630237
These anthropomorphization tendencies include both viewing intermediate tokens as interpretable traces of LLM's "thinking" and confusing the length of the intermediate tokens as indicative of the "thinking effort" 2/
https://twitter.com/rao2z/status/1837163109246443918
FollowingOpenAI's own statements, as well as our own understanding of what o1 is doing 👇, we treat o1 as an LRM that is fundamentally different from all the LLMs that preceded it (e.g. the RL on CoT moves; and the costly inference stage) 2/ https://x.com/rao2z/status/1834354533931385203
Once you are an approximate reasoner, you might develop the "don't tell me how to solve the problem; I already have a way I use to solve the problem" complex..👇https://x.com/rao2z/status/1835093955744350605

One paper, lead by @kayastechly (w/ @mattdmarq), evaluated the claims over a suite of graph coloring problems. The setup allows for GPT4 guessing a valid coloring in stand alone and self-critiquing modes. There is an external sound verifier outside the self-critiquing loop. 2/
https://twitter.com/rao2z/status/1659715298679832577?s=20
https://twitter.com/rao2z/status/1287387393130000385?s=20
https://twitter.com/rao2z/status/889509356084928518?s=20One popular line claims that while LLM's may give wrong plans, they can improve with right prompting (which, in one case, is claimed to even induce "inner monolog" all West World host-like).
https://twitter.com/rao2z/status/1643463201462579200?s=20) must be a clear outlier.. 🙄 Thank goodness I pay big bucks for my subscription.. 1/
Interestingly, I was just telling someone today how several of the papers on "LLMs for Task Planning by Prompting" are rife with the Clever Hans effect (c.f. en.wikipedia.org/wiki/Clever_Ha… ). I guess I will have to do a thread.. 2/
https://twitter.com/rao2z/status/1539435614503768065?s=20
Our benchmark tasks (prompts) are posed in the context of common "toy domains" used in automated planning, and are small enough to not involve any huge combinatorics. In particular, they should be accessible to lay humans. 2/