ChemCrow is out today in @NatMachIntell! ChemCrow is an agent that uses chem tools and a cloud-based robotic lab for open-ended chem tasks. It’s been a journey to get to publication and I’d like to share some history about it. It started back in 2022. 1/8
I was working as a red teamer for GPT-4 and kept getting hallucinated molecules when trying to get up to trouble in chemistry. Then I tried the ReAct agent (from @ShunyuYao12 ) quickly saw real molecules. This work eventually was public in GPT-4 technical report 2/8
The problem with LLM agents in science is that they must be judged in the lab. So I called @pschwllr – the best chemists I know, and the inventor of molecular transformers. We teamed-up and worked together on a plan to improve and test the agent. 3/8
We then brought on the extremely talented @drecmb and @SamCox822 – the co-first authors who developed many of the tools, evaluation ideas, and guardrails to ensure safety, and did the majority of the difficult work. 4/8
We knew that the exciting next step is a cloud lab to automatically execute and test the molecular designs. We teamed up with @OSchilter and @CarloBalda97 – and got to experimental validation in a cloud lab - including having chemcrow design a novel dye. 5/8
Near the end of the ChemCrow Project, I joined with @SGRodriques to found @FutureHouseSF around scientific agents and automated laboratories. @SamCox822 joined shortly after and we followed-up with WikiCrow – an agent that does scientific literature research. 6/8
So what’s up with the crow? Crows can talk – like a parrot – but their intelligence lies in tool use. We're continuing the journey at FutureHouse on building scientific crows and can't wait to share more :) 7/8
Finishing 2024 with one more research result! We’ve trained small language agents to do hard sci tasks: engineering proteins, manipulating DNA, and working with sci literature in a new library called Aviary. We beat humans and frontier LLMs on these tasks!
Aviary is a gymnasium of new scientific environments. Using behavior cloning, expert iteration, and consensus sampling we’ve trained Llamma-3.1 8B agents to very high accuracy on challenging multi-step tasks. And at low cost! futurehouse.org/research-annou…
A lot of effort in this work was framing the learning problem of agents. We settled on defining agents using stochastic compute graphs and splitting the environment and agent according to what we want to train. Here are some components of well-known agents as compute graphs:
We’ve just finished writing the missing 15,616 Wikipedia articles to get complete coverage of all 19,255 human genes. We used PaperQA2, which has higher accuracy than existing human-written Wikipedia articles, as judged by blinded biology PhD students and postdocs. 1/5
link:
These articles considered almost 1M research papers. At our current infra, we could rewrite all 19.3k articles every week so they are always up to date. We could rewrite all articles about research on Wikipedia every three weeks. 2/5 wikicrow.ai
A lot of effort in this work is finding relevant sources and evaluating their quality. This includes checking for retractions, predatory publishers, and citations. As the amount of science papers grow, evaluating source quality is a major challenge 3/5
How can you learn to predict peptide properties without negative examples? This happens often when trying to analyze outputs from screening results. We explore various approaches in this new paper from @MehradAnsari. 1/4
Peptide screening usually gives positive examples, which makes it difficult to train a classifier. Previous work has been done on this - including one-class SVM. We evaluate these and propose a modified algorithm built on "spies" 2/4
A spy is a positive example, which you relabel as a negative to help identify a decision boundary. Using a basic convolutional NN, we evaluate our method across multiple tasks where we happen to have negative and positive data. 3/4
How can you check if a molecule is present in a >10B dataset in 0.2 ms? With bloom filters! Checkout our preprint on bloom filters by @4everstudent95 1/4
Bloom filters are fast and can store ultra large chemical libraries in RAM, at the cost of a false positive rate of 0.005 (can tune this!) 2/4
In the paper, we compared fingerprints vs SMILES and found that bloom filters on SMILES are faster and have better performance. In fact, SMILES follows theoretical performance exactly 3/4
Our preprint on using GPT-4 as an agent with tools for chemistry is out! We call it ChemCrow. Working with @SamCox822, @drecmb@pschwllr, we developed a set of tools for synthesis/cond, safety, commercial availability, patents, paper-qa
We, unsurprisingly, found that GPT-4 with tools is much better than GPT-4 alone. Here it outlines a synthesis for atorvastatin complete with steps, an ingredient list, cost, and suppliers. We implement this with @LangChainAI (great library!)
One of the biggest surprises for us was that GPT-4 has trouble evaluating the completions! Comparing the two answers, GPT-4 as an evaluator ranks ChemCrow to be about the same, even though GPT-4 alone fails often. 3/5
First - "mutate." Basically create similar molecules from the given molecule. This is interesting in modifying compounds for design or XAI - building out local chemical spaces. 2/4
Second - "add." Combine two molecules. This is interesting for joining fragments in drug discovery, or just for modifying a scaffold. GPT-4 does pretty well here! 3/4