In bemoaning how things are getting worse everyday, we often tend to forget that the state of the world is becoming monotonically more observable. 1/
It may not be so much that there is monotonically increasing suffering in this world, but that it is monotonically more observable--we can be aware of it, if we choose to. 2/
Wars become forever stalemates because both parties have much better observability into the state of the adversary. As my son says, Normandy-like surprise attacks are much harder in this era of satellite/cell signal observability (Ukraine being a textbook case in point..) 3/
On the whole, this observability is a force for good, IMHO--in as much as it actually gives us a choice of using our knowledge to improve the state of affairs: realize the impossibility of surprise wars or make it harder to look away from the suffering in the far corners. 4/
So on this #Festivus day, let's celebrate the human ingenuity that has been the main driver of the increased observability of our world!

May all our POMDPs become MDPs! 5/ Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rao2z

Sep 23
A research note describing our evaluation of the planning capabilities of o1 🍓 is now on @arxiv (thanks to @karthikv792 & @kayastechly). As promised, here is a summary (..although you should read the whole thing..) 🧵 1/ arxiv.org/abs/2409.13373

Image
FollowingOpenAI's own statements, as well as our own understanding of what o1 is doing 👇, we treat o1 as an LRM that is fundamentally different from all the LLMs that preceded it (e.g. the RL on CoT moves; and the costly inference stage) 2/

We start by looking at the SOTA LLM performance on PlanBench. LLaMA 3.1 405B has the best performance on the blocks world at 62.6%. None of the LLMs do better than 5% on the mystery BW. 3/ Image
Read 10 tweets
Oct 21, 2023
Can LLMs really self-critique (and iteratively improve) their solutions, as claimed in the literature?🤔

Two new papers from our group investigate (and call into question) these claims in reasoning () and planning () tasks.🧵 1/ arxiv.org/abs/2310.12397
arxiv.org/abs/2310.08118

Image
Image
One paper, lead by @kayastechly (w/ @mattdmarq), evaluated the claims over a suite of graph coloring problems. The setup allows for GPT4 guessing a valid coloring in stand alone and self-critiquing modes. There is an external sound verifier outside the self-critiquing loop. 2/ Image
GPT4 has <20% accuracy in guessing colorings, as should perhaps be expected given its reasoning (in)capabilities. What is perhaps more surprising is that the accuracy *FALLS* in the self-critiquing mode (2nd bar👇)--running counter to all the claims of self-improvement! 3/ Image
Read 14 tweets
Jul 19, 2023
It is hilarious that LLMs are making traditional symbolic #AI relevant, and in the process merrily exposing the ignorance of the post-Alexnet yung'uns who skipped their Intro #AI's to do MORE LAYERS, only to find themselves busy with ersatz natural science with LLMs. 🧵 1/
Without background in combinatorial search and logical inference, you are susceptible to conflating brute force search (or forest of jumbled thoughts prompting) as something to be proud of instead of seeing them for their "Rube Goldberg" silliness.. 2/

Without background in logical reasoning and planning, you are more than likely to confuse retrieval for reasoning.. 3/

Read 8 tweets
Jun 8, 2023
[Paradoxes of Approximate Omniscience:] 🧵We all know, by now, that our intuitions *suck* at high dimensions.

We haven't yet come to grips with the fact that our intuitions about *approximate omniscience* suck too!

(And this explains some of our puzzlement at LLMs)
1/
Re: high-D, we all have been surprised by the core-less apples. George Dantzig famously explained the surprising efficiency of (worst case exponential) Simplex algorithm with a pithy "One's intuition in higher dimensional space is not worth a damn!" 2/

LLMs have laid bare our lack of intuitions about approximate omniscience as acquired by LLMs trained on web-scale text. (Something certainly demonstrated by some of the responses to my recent threat 😋) 3/

Read 8 tweets
May 15, 2023
Planning & LLMs: A(nother) 🧵

Making plans in the world involves (1) discovering actions (and their precondition/effect causal dependencies), and (2) sequencing an appropriate subset of available/discovered actions to achieve the agent's goals. 1/
The former requires *broad knowledge* about actions available in the world and their individual effects, while the latter requires deep drilling-down over a given set of actions to ensure that all goals are supported (causal chaining) without any undesirable interactions. 2/
LLMs have an edge on the former--they do indeed have web-scale broad knowledge! They are however *lousy* at the second--they can't do any real combinatoric search if their life depends on it .. 👇3/

Read 10 tweets
Apr 20, 2023
🧵Been reading several recent arXiv entries claiming planning capabilities of #LLM's. This area is so full of anthropomorphisms--"Chain of Thought Prompting", "Inner Monolog"--that it cries out for a cleansing read of Drew's #AI meets Natural Stupidity 1/
One popular line claims that while LLM's may give wrong plans, they can improve with right prompting (which, in one case, is claimed to even induce "inner monolog" all West World host-like).

The prompts in the appendices however seem to suggest a Clever Hans effect in action 2/
Clever Hans, btw, is the ChatHorseGPT of its time--that showed very un-horselike arithmetic prowess by stomping its foot the right number of times, as long as the questioner was in front and knew the answers. Crucially, no fraudulent intent was needed. 3/
en.wikipedia.org/wiki/Clever_Ha… Image
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(