Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Andrew White 🐦‍⬛

@andrewwhite01

May 8, 2024 • 8 tweets • 4 min read • Read on X

Scrolly

ChemCrow is out today in @NatMachIntell! ChemCrow is an agent that uses chem tools and a cloud-based robotic lab for open-ended chem tasks. It’s been a journey to get to publication and I’d like to share some history about it. It started back in 2022. 1/8

I was working as a red teamer for GPT-4 and kept getting hallucinated molecules when trying to get up to trouble in chemistry. Then I tried the ReAct agent (from @ShunyuYao12 ) quickly saw real molecules. This work eventually was public in GPT-4 technical report 2/8

The problem with LLM agents in science is that they must be judged in the lab. So I called @pschwllr – the best chemists I know, and the inventor of molecular transformers. We teamed-up and worked together on a plan to improve and test the agent. 3/8

We then brought on the extremely talented @drecmb and @SamCox822 – the co-first authors who developed many of the tools, evaluation ideas, and guardrails to ensure safety, and did the majority of the difficult work. 4/8

We knew that the exciting next step is a cloud lab to automatically execute and test the molecular designs. We teamed up with @OSchilter and @CarloBalda97 – and got to experimental validation in a cloud lab - including having chemcrow design a novel dye. 5/8

Near the end of the ChemCrow Project, I joined with @SGRodriques to found @FutureHouseSF around scientific agents and automated laboratories. @SamCox822 joined shortly after and we followed-up with WikiCrow – an agent that does scientific literature research. 6/8

So what’s up with the crow? Crows can talk – like a parrot – but their intelligence lies in tool use. We're continuing the journey at FutureHouse on building scientific crows and can't wait to share more :) 7/8

Sources:
Paper:
GPT-4 Tech Report:
Red Teaming
ReAct paper:
WikiCrow:
8/8nature.com/articles/s4225…
arxiv.org/abs/2303.08774
ft.com/content/087668…
arxiv.org/abs/2210.03629
futurehouse.org/wikicrow

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @andrewwhite01

Andrew White 🐦‍⬛

@andrewwhite01

Jul 23

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

The design process of HLE required the questions to be unanswerable by contemporary LLMs. That lead to many gotcha style questions like the one below. It’s a trick question – in 2002, a few atoms of a group 18 element Oganesson were made for a few milliseconds. 2/7

It’s a clever question. But it’s not really about frontier science. Multiple papers have shown that Oganesson is not a gas (it’s predicted to be semiconducting solid), it’s not noble (it’s reactive), and it isn’t included in any "terrestrial matter" tables of noble gases. 3/7

Read 7 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Dec 31, 2024

Finishing 2024 with one more research result! We’ve trained small language agents to do hard sci tasks: engineering proteins, manipulating DNA, and working with sci literature in a new library called Aviary. We beat humans and frontier LLMs on these tasks!

Aviary is a gymnasium of new scientific environments. Using behavior cloning, expert iteration, and consensus sampling we’ve trained Llamma-3.1 8B agents to very high accuracy on challenging multi-step tasks. And at low cost!
futurehouse.org/research-annou…

A lot of effort in this work was framing the learning problem of agents. We settled on defining agents using stochastic compute graphs and splitting the environment and agent according to what we want to train. Here are some components of well-known agents as compute graphs:

Read 7 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Oct 23, 2024

We’ve just finished writing the missing 15,616 Wikipedia articles to get complete coverage of all 19,255 human genes. We used PaperQA2, which has higher accuracy than existing human-written Wikipedia articles, as judged by blinded biology PhD students and postdocs. 1/5

link:
These articles considered almost 1M research papers. At our current infra, we could rewrite all 19.3k articles every week so they are always up to date. We could rewrite all articles about research on Wikipedia every three weeks. 2/5 wikicrow.ai

A lot of effort in this work is finding relevant sources and evaluating their quality. This includes checking for retractions, predatory publishers, and citations. As the amount of science papers grow, evaluating source quality is a major challenge 3/5

Read 6 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Jun 6, 2023

@MehradAnsari

How can you learn to predict peptide properties without negative examples? This happens often when trying to analyze outputs from screening results. We explore various approaches in this new paper from @MehradAnsari. 1/4

biorxiv.org/content/10.110…

Peptide screening usually gives positive examples, which makes it difficult to train a classifier. Previous work has been done on this - including one-class SVM. We evaluate these and propose a modified algorithm built on "spies" 2/4

A spy is a positive example, which you relabel as a negative to help identify a decision boundary. Using a basic convolutional NN, we evaluate our method across multiple tasks where we happen to have negative and positive data. 3/4

Read 4 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Apr 12, 2023

@4everstudent95

How can you check if a molecule is present in a >10B dataset in 0.2 ms? With bloom filters! Checkout our preprint on bloom filters by @4everstudent95 1/4

Code: github.com/whitead/molblo…
Paper: arxiv.org/abs/2304.05386

Bloom filters are fast and can store ultra large chemical libraries in RAM, at the cost of a false positive rate of 0.005 (can tune this!) 2/4

In the paper, we compared fingerprints vs SMILES and found that bloom filters on SMILES are faster and have better performance. In fact, SMILES follows theoretical performance exactly 3/4

Read 4 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Apr 12, 2023

@SamCox822

Our preprint on using GPT-4 as an agent with tools for chemistry is out! We call it ChemCrow. Working with @SamCox822, @drecmb @pschwllr, we developed a set of tools for synthesis/cond, safety, commercial availability, patents, paper-qa

arxiv.org/abs/2304.05376 1/5

@LangChainAI

We, unsurprisingly, found that GPT-4 with tools is much better than GPT-4 alone. Here it outlines a synthesis for atorvastatin complete with steps, an ingredient list, cost, and suppliers. We implement this with @LangChainAI (great library!)

One of the biggest surprises for us was that GPT-4 has trouble evaluating the completions! Comparing the two answers, GPT-4 as an evaluator ranks ChemCrow to be about the same, even though GPT-4 alone fails often. 3/5

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Andrew White 🐦‍⬛

Try unrolling a thread yourself!

More from @andrewwhite01

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!