Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) Profile picture
AI researcher & teacher @SCAI_ASU. Works on Human-Aware AI. Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmMW6
2 subscribers
Sep 23 10 tweets 5 min read
A research note describing our evaluation of the planning capabilities of o1 🍓 is now on @arxiv (thanks to @karthikv792 & @kayastechly). As promised, here is a summary (..although you should read the whole thing..) 🧵 1/ arxiv.org/abs/2409.13373

Image FollowingOpenAI's own statements, as well as our own understanding of what o1 is doing 👇, we treat o1 as an LRM that is fundamentally different from all the LLMs that preceded it (e.g. the RL on CoT moves; and the costly inference stage) 2/

Oct 21, 2023 14 tweets 5 min read
Can LLMs really self-critique (and iteratively improve) their solutions, as claimed in the literature?🤔

Two new papers from our group investigate (and call into question) these claims in reasoning () and planning () tasks.🧵 1/ arxiv.org/abs/2310.12397
arxiv.org/abs/2310.08118

Image
Image
One paper, lead by @kayastechly (w/ @mattdmarq), evaluated the claims over a suite of graph coloring problems. The setup allows for GPT4 guessing a valid coloring in stand alone and self-critiquing modes. There is an external sound verifier outside the self-critiquing loop. 2/ Image
Jul 19, 2023 8 tweets 3 min read
It is hilarious that LLMs are making traditional symbolic #AI relevant, and in the process merrily exposing the ignorance of the post-Alexnet yung'uns who skipped their Intro #AI's to do MORE LAYERS, only to find themselves busy with ersatz natural science with LLMs. 🧵 1/ Without background in combinatorial search and logical inference, you are susceptible to conflating brute force search (or forest of jumbled thoughts prompting) as something to be proud of instead of seeing them for their "Rube Goldberg" silliness.. 2/

Jun 8, 2023 8 tweets 3 min read
[Paradoxes of Approximate Omniscience:] 🧵We all know, by now, that our intuitions *suck* at high dimensions.

We haven't yet come to grips with the fact that our intuitions about *approximate omniscience* suck too!

(And this explains some of our puzzlement at LLMs)
1/
Re: high-D, we all have been surprised by the core-less apples. George Dantzig famously explained the surprising efficiency of (worst case exponential) Simplex algorithm with a pithy "One's intuition in higher dimensional space is not worth a damn!" 2/

May 15, 2023 10 tweets 3 min read
Planning & LLMs: A(nother) 🧵

Making plans in the world involves (1) discovering actions (and their precondition/effect causal dependencies), and (2) sequencing an appropriate subset of available/discovered actions to achieve the agent's goals. 1/ The former requires *broad knowledge* about actions available in the world and their individual effects, while the latter requires deep drilling-down over a given set of actions to ensure that all goals are supported (causal chaining) without any undesirable interactions. 2/
Apr 20, 2023 12 tweets 5 min read
🧵Been reading several recent arXiv entries claiming planning capabilities of #LLM's. This area is so full of anthropomorphisms--"Chain of Thought Prompting", "Inner Monolog"--that it cries out for a cleansing read of Drew's #AI meets Natural Stupidity 1/
One popular line claims that while LLM's may give wrong plans, they can improve with right prompting (which, in one case, is claimed to even induce "inner monolog" all West World host-like).

The prompts in the appendices however seem to suggest a Clever Hans effect in action 2/
Apr 20, 2023 5 tweets 3 min read
So @TheEconomist tells me now that #LLMs can do planning and reasoning after all. Obviously our own dismal experience of their planning performance (c.f. the 🧵 at ) must be a clear outlier.. 🙄 Thank goodness I pay big bucks for my subscription.. 1/ Image Interestingly, I was just telling someone today how several of the papers on "LLMs for Task Planning by Prompting" are rife with the Clever Hans effect (c.f. en.wikipedia.org/wiki/Clever_Ha… ). I guess I will have to do a thread.. 2/ Image
Apr 5, 2023 10 tweets 4 min read
Afraid of #GPT4 going rogue and killing y'all? Worry not. Planning has got your back. You can ask it to solve any simple few step classical planning problem and snuff that "AGI spark" well and good.

Let me explain.. 🧵 1/ Almost a year back, intrigued by the breathless "LLMs are Zero Shot reasoners" papers, we tested their ability to autonomously come up with simple plans given domain models. The results were *pretty bleak.*👇 2/
Dec 23, 2022 5 tweets 2 min read
In bemoaning how things are getting worse everyday, we often tend to forget that the state of the world is becoming monotonically more observable. 1/ It may not be so much that there is monotonically increasing suffering in this world, but that it is monotonically more observable--we can be aware of it, if we choose to. 2/
Jul 29, 2022 10 tweets 4 min read
The impressive deep pattern recognition abilities of #DNN's such as #LLM's are sometimes confused for reasoning abilities

I can learn to guess, with high accuracy, whether a SAT instance is satisfiable or not, but this not the same as knowing how to solve SAT. Let me explain. 1/ Suppose you train a learner with a large number of Boolean 3-SAT instances labeled with whether or not they are satisfiable. There is no reason to doubt that a modern #DNN-based leaner will manage to learn deep features corresponding to the γ ratio-- #clauses/#variable .. 2/
Jul 11, 2022 14 tweets 5 min read
There seems to be an almost willful confusion about the need and role for explainability of #AI systems on #AI twitter.

Contrary to the often polarizing positions, it is neither the case that we always need explanations nor is it the case that we never need explanations. 🧵1/ We look for explanations of high level decisions of (what for us are) explicit knowledge tasks; and where contestability and collaboration are important.

We rarely look for explanations of tacit knowledge/low level control decisions. 2/
Jun 22, 2022 7 tweets 4 min read
Intrigued by the profusion of 'em "#LLM's are Zero-shot <XXX>'s" papers, we set out to see how good LLMs are at planning and reasoning about change.

tldr; off-the-shelf #GPT3 is pretty bad at these..

👉arxiv.org/abs/2206.10498

(w/ @karthikv792 @sarath_ssreedh & @_aolmo_) 1/ Image Our benchmark tasks (prompts) are posed in the context of common "toy domains" used in automated planning, and are small enough to not involve any huge combinatorics. In particular, they should be accessible to lay humans. 2/