So @TheEconomist tells me now that #LLMs can do planning and reasoning after all. Obviously our own dismal experience of their planning performance (c.f. the 🧵 at ) must be a clear outlier.. 🙄 Thank goodness I pay big bucks for my subscription.. 1/ Image
Interestingly, I was just telling someone today how several of the papers on "LLMs for Task Planning by Prompting" are rife with the Clever Hans effect (c.f. en.wikipedia.org/wiki/Clever_Ha… ). I guess I will have to do a thread.. 2/ Image
(While we should all be used to #LLM hype-expertise in the press by now, this particular case was prickly as it is my cocky son who airily pointed this article out to me at dinner with barely concealed delight.. 😡 ) 3/
On a related note, why does @TheEconomist insist on publishing their columns without listing the name of the journalist(s) who wrote it? Unless ChatGPT is producing the column, I don't see the advantage of not listing authorship..🤔

(Next will be Clever Hans Thread..) 4/
..and here is the promised Clever Hans thread ;-)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rao2z

Apr 20
🧵Been reading several recent arXiv entries claiming planning capabilities of #LLM's. This area is so full of anthropomorphisms--"Chain of Thought Prompting", "Inner Monolog"--that it cries out for a cleansing read of Drew's #AI meets Natural Stupidity 1/
One popular line claims that while LLM's may give wrong plans, they can improve with right prompting (which, in one case, is claimed to even induce "inner monolog" all West World host-like).

The prompts in the appendices however seem to suggest a Clever Hans effect in action 2/
Clever Hans, btw, is the ChatHorseGPT of its time--that showed very un-horselike arithmetic prowess by stomping its foot the right number of times, as long as the questioner was in front and knew the answers. Crucially, no fraudulent intent was needed. 3/
en.wikipedia.org/wiki/Clever_Ha… Image
Read 12 tweets
Dec 23, 2022
In bemoaning how things are getting worse everyday, we often tend to forget that the state of the world is becoming monotonically more observable. 1/
It may not be so much that there is monotonically increasing suffering in this world, but that it is monotonically more observable--we can be aware of it, if we choose to. 2/
Wars become forever stalemates because both parties have much better observability into the state of the adversary. As my son says, Normandy-like surprise attacks are much harder in this era of satellite/cell signal observability (Ukraine being a textbook case in point..) 3/
Read 5 tweets
Jul 29, 2022
The impressive deep pattern recognition abilities of #DNN's such as #LLM's are sometimes confused for reasoning abilities

I can learn to guess, with high accuracy, whether a SAT instance is satisfiable or not, but this not the same as knowing how to solve SAT. Let me explain. 1/
Suppose you train a learner with a large number of Boolean 3-SAT instances labeled with whether or not they are satisfiable. There is no reason to doubt that a modern #DNN-based leaner will manage to learn deep features corresponding to the γ ratio-- #clauses/#variable .. 2/
..and armed with γ, it can also essentially figure out the sharp-threshold phenomenon w.r.t. to γ, and should be able to predict with high certainty that the γ < 4.3 are satisfiable and γ > 4.3 are unsatisfiable. 3/ Image
Read 10 tweets
Jul 11, 2022
There seems to be an almost willful confusion about the need and role for explainability of #AI systems on #AI twitter.

Contrary to the often polarizing positions, it is neither the case that we always need explanations nor is it the case that we never need explanations. 🧵1/
We look for explanations of high level decisions of (what for us are) explicit knowledge tasks; and where contestability and collaboration are important.

We rarely look for explanations of tacit knowledge/low level control decisions. 2/
I don't need explanation on why you see a dog in a picture; why you put your left foot 3 mm ahead of your left, or why facebook recommends me yet another page.

I do want one if am denied a loan, or I need a better model of you so I can coordinate with you. 3/
Read 14 tweets
Jun 22, 2022
Intrigued by the profusion of 'em "#LLM's are Zero-shot <XXX>'s" papers, we set out to see how good LLMs are at planning and reasoning about change.

tldr; off-the-shelf #GPT3 is pretty bad at these..

👉arxiv.org/abs/2206.10498

(w/ @karthikv792 @sarath_ssreedh & @_aolmo_) 1/ Image
Our benchmark tasks (prompts) are posed in the context of common "toy domains" used in automated planning, and are small enough to not involve any huge combinatorics. In particular, they should be accessible to lay humans. 2/
If these results seem contrary to the optimism surrounding #LLM's emergent reasoning abilities, (e.g. logical & ethical judgements), we think it may be because those benchmarks correspond to very shallow reasoning that can more easily be mimicked from previous patterns. 3/
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(