I focus mainly on exploring situations where the LLM is used as agent that has access to some tools, and needs to answer a question using the tools it has available. For example:
These tasks involve the LLM figuring out when it needs to use a tool, then observing the result of using that tool, and then taking another action based on that observation
IMO these are all good proxies for intelligence
What did I evaluate?
1⃣Self-Ask with Search on @OfirPress compositional celebrities dataset
2⃣Anecdotal failure modes @momusbah found
3⃣ReAct on the HotPotQA dataset
Note that these are definitely NOT comprehensive, and if anyone is interested I'd love to collaborate on more
1⃣ First evaluation: Self-Ask (with Search) on the Compositional Celebrities Dataset
This is a dataset @OfirPress created to judge the LLMs reasoning ability on multi-hop question answering
The fact that self-ask with search showed a large improvement (0.34 -> 0.41) while normal self-ask did not suggests to me that one of the big improvements of -003 is in its ability to use and interact with external tools (rather than its chain of thought like reasoning)
2⃣ The second method I used to evaluate was some anecdotal failure modes @momusbah provided
We had actually been talking the day before `text-davinci-003` came out about these, so it was great timing!
A general framework for interacting with an API in natural language
🧵See below for a more in depth explanation + examples
At a high level, the flow is:
1⃣ Format a prompt with API docs + a question
2⃣ Have an LLM generate API query to run to get an answer
3⃣ Run said API query
4⃣ Have LLM interpret API response and answer original question in natural language
Note that the LLM is doing in context learning (via the API docs) to figure out how to call the API
For popular APIs, the LLM may(?) be able to generate the correct API call without that context... but this methodology allows it to work on smaller, newer, or private APIs
LLM understanding of legal reasoning / legal language.
Given that legal documents are long and complicated, they are developing approaches for LLMs to recursively analyze them in sequential chains of LLM interactions.
💥 We've added a LOT of stuff to @LangChainAI recently
I've gotten asked a few times by users & contributors what LangChain helps with and what the main value props are (the amount in there doesn't make that clear)
Here's my answer:
🦜🔗LangChain is aimed at making it easy to develop applications with LLMs. There are 3 main areas it helps with (with a bonus sneak peak of a 4th). In increasing order of complexity:
🦜 LLMs and Prompts
🔗 Chains
🤖 Agents
🧠 ****** (you have to read to end to find out)
I'll go over all of these in this thread, but for more information please see: