Bindu Reddy Profile picture
Sep 10, 2023 1 tweets 3 min read Read on X
LLMs and Their Human-like Reasoning Capabilities - Fact or Fiction?

Research into enabling Large Language Models (LLMs) to display human-like reasoning has been a hot topic. However, some others have argued that LLMs aren't capable of reasoning given that are glorified next-word predictors.

Here is where we are today -

Chain of thought (CoT) reasoning - Prompting a "chain of thought"—a series of intermediate reasoning steps—can significantly improve the performance of LLMs in complex reasoning tasks.

The human brain typically decomposes a math problem into intermediate steps and solves each step before giving the final answer: “After Jane gives 2 flowers to her mom she has 10 . . . then after she gives 3 to her dad she will have 7 . . . so the answer is 7.”

The basic idea behind CoT is to prompt the model with some examples of these types of problems and the step-by-step reasoning involved in solving them. By prompting in this way, the model can decompose the problem similar to the human brain and solve for it.

LLMs when prompted with just eight chain-of-thought examples, achieved state-of-the-art accuracy surpassing even fine-tuned versions of GPT-3

CoT prompting helps in improving airthmetic, commonsense, symbolic and multi-step reasoning capabilities of the LLM

That said, the CoT prompting is fairly limited. It depends a lot on the quality of the prompt and the model doesn't have memory and the prompt size is limited by the model's context length.

Recently researchers at DeepMind released a paper that outlines the Tree of Thoughts (ToT) framework. It addresses the shortcomings of existing approaches that do not explore different continuations within a thought process or incorporate any type of planning, lookahead, or backtracking to evaluate different options.

ToT frames any problem as a search over a tree, where each node represents a partial solution with the input and the sequence of thoughts so far. It allows LMs to explore multiple reasoning paths over thoughts, each thought being a coherent language sequence that serves as an intermediate step toward problem solving.

The ToT process involves four key steps:

- Decomposing the intermediate process into thought steps.
- Generating potential thoughts from each state.
- Heuristically evaluating states.
- Deciding what search algorithm to use.

The ToT framework allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action. It also enables looking ahead or backtracking when necessary to make global choices.

The framework is versatile and can handle challenging tasks. It also improves the interpretability of model decisions and the opportunity for human alignment, as the resulting representations are readable, high-level language reasoning instead of implicit, low-level token values.

In practice, the ToT framework has significantly enhanced language models' problem-solving abilities on tasks requiring non-trivial planning or search.

In the Game of 24, while a model with chain-of-thought prompting only solved 4% of tasks, the ToT method achieved a success rate of 74%.

ToT is a prompting framework has well and has the same fundamental limitations as the CoT framework. ToT needs careful thought decomposition and generation, and relies on heuristics for evaluating states and deciding on the search algorithm.

However it's important to remember that while LLMs can mimic certain types of reasoning to a certain extent, but it's not the same as human reasoning.

They don't have the ability to understand, self-correct, or make judgments based on a deep understanding of the world. They're more like really good actors who can deliver their lines convincingly but don't actually understand the plot of the play.

In summary, AI is still extremely early when it comes to reasoning and we may need a another significant breakthrough before LLMs can really measure up to humans.
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bindu Reddy

Bindu Reddy Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @bindureddy

Dec 20, 2024
CodeLLM - AI CODE EDITOR THAT COMBINES ALL THE BEST CODING LLMs

We combined all the best coding LLMs, including o1, Sonnet 3.5, Gemini, and Qwen, and created CodeLLM. The new CodeLLM is available in a VS code-based client by the same name.

We also get UNLIMITED INTRODUCTORY QUOTA, so you can help us test and improve it.

CodeLLM finishes your code, responds to prompts, and can chat with your code base. You can also chat with CodeLLM or any other SOTA LLMs to get answers to your questions.

Over the next few months, CodeLLM will intuit your thoughts from a simple phrase and code magically ;)

HERE IS THE BEST PART - ChatLLM and CodeLLM are bundled together... So it comes FOR FREE with ChatLLM.

Thread show casing CodeLLM.

1/. Simple Auto-complete
2/ Choose CodeLLM from the pull-down and it automatically routes to the best LLM based on query
3/ Prompt your way to a new feature :)
Read 5 tweets
Dec 14, 2024
LAUNCHING COMPUTER AGENT - AI USES BROWSER / COMPUTER TO PERFORM WORK

Today, we are excited to make the world’s FIRST COMPUTER AGENT generally available on ChatLLM.

A computer Agent is an AI agent that uses a browser and computer to perform work like humans do. You can log into different accounts like LinkedIn, Salesforce, or Google Docs and ask the computer to complete your job.

This feature is launching in Beta, but you can already do many fun things.

Some examples

- Go to LinkedIn, find all the CIOs in the food industry, and enter them into Salesforce
- Scrape a website for pricing and start a Google sheet to track prices
- Create a new spaceship game on my computer
- Play my favorite songs on YouTube!

The LLM will control the browser and other apps on your behalf. Over time, this will become a new approach for AI agents.

See the thread for fun examples of what the COMPUTER AGENT can do.

1/ Non-Techies Can Build Cool Computer Games
2/ Summarize Your "For You" Twitter Feed
3/ Data Analysis On YouTube

Navigate to YouTube and analyze trends.
Read 5 tweets
Jun 19, 2024
Five prompts that upskill you instantly and help you improve your effectiveness and productivity significantly...

I currently average about 30+ queries a day to ChatLLM, and most of my prompts are some variations of these

👇
1. Use the 80/20 principle to learn faster:

Identify the most important insights (20%) so that you can become familiar with 80% of any particular topic that you have no clue about.
2. Improve your writing by getting feedback:

[paste your writing]

Ask an LLM to proofread and check typos. Claude is very good at this
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(