Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Andrew White 🐦‍⬛

@andrewwhite01

Mar 16, 2023 • 10 tweets • 4 min read • Read on X

Scrolly

Can GPT-4 do drug discovery? No, but it can help. Let's walk through GPT-4 proposing new drugs. This is called knowledge-based screening. We're trying to fill a list of plausible compounds that could lead to new drugs based on research papers. 1/n

This is one small step in drug discovery. There are many others! The compounds GPT-4 proposes have to be made and tested, and then they just start a path towards a new drug. Let's do a new example for psoriasis by targeting a known protein TYK2. Here is the prompt. 2/n

I made tools for GPT-4 to use - it will hallucinate when working with molecules directly. I instruct it to rely on these tools. First it does literature searches using one of these tools on the target. 3/n

Next it parses the literature review (itself constructed from gpt-3.5-turbo) and identifies drugs from it. Sometimes it doesn't know which are small molecule and which are antibodies, so it uses a tool for this. 4/n

It determines these are patented. Now it uses another tool to propose modifications of the compounds it identified. This part is simplistic - these small reaction changes are not a real escape from known patents nor what a real medchemist might do. 5/n

It does this for all compounds and then checks if the modified compounds are novel. Some are patented, some aren't. Notice here novel means it is not present in the surechembl database, an approximation of a real patent search. 6/n

Finally, it determines most of the compounds it proposed are not purchasable and must be synthesized. So it proposes an email for synthesis 7/n

Compound #2 (shown in pic at the top) is very very similar to a Chinese pharma patent compound. So I believe it would be a TYK2 inhibitor. However - it would certainly not be considered novel relative to this patent. 8/n

This is a walkthrough of the same approach shown in the GPT-4 system with more details. So how much was GPT-4 doing chemistry? Not much - it's mostly used for reasoning, selecting tools, and identifying compound names. 9/n

What will the impact be on drug discovery? Unknown. It definitely opens the door to automating more things. And this example shows some hints, but this example will not dramatically change drug discovery. 10/n

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @andrewwhite01

Andrew White 🐦‍⬛

@andrewwhite01

Dec 31, 2024

Finishing 2024 with one more research result! We’ve trained small language agents to do hard sci tasks: engineering proteins, manipulating DNA, and working with sci literature in a new library called Aviary. We beat humans and frontier LLMs on these tasks!

Aviary is a gymnasium of new scientific environments. Using behavior cloning, expert iteration, and consensus sampling we’ve trained Llamma-3.1 8B agents to very high accuracy on challenging multi-step tasks. And at low cost!
futurehouse.org/research-annou…

A lot of effort in this work was framing the learning problem of agents. We settled on defining agents using stochastic compute graphs and splitting the environment and agent according to what we want to train. Here are some components of well-known agents as compute graphs:

Read 7 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Oct 23, 2024

We’ve just finished writing the missing 15,616 Wikipedia articles to get complete coverage of all 19,255 human genes. We used PaperQA2, which has higher accuracy than existing human-written Wikipedia articles, as judged by blinded biology PhD students and postdocs. 1/5

link:
These articles considered almost 1M research papers. At our current infra, we could rewrite all 19.3k articles every week so they are always up to date. We could rewrite all articles about research on Wikipedia every three weeks. 2/5 wikicrow.ai

A lot of effort in this work is finding relevant sources and evaluating their quality. This includes checking for retractions, predatory publishers, and citations. As the amount of science papers grow, evaluating source quality is a major challenge 3/5

Read 6 tweets

Andrew White 🐦‍⬛

@andrewwhite01

May 8, 2024

ChemCrow is out today in @NatMachIntell! ChemCrow is an agent that uses chem tools and a cloud-based robotic lab for open-ended chem tasks. It’s been a journey to get to publication and I’d like to share some history about it. It started back in 2022. 1/8

I was working as a red teamer for GPT-4 and kept getting hallucinated molecules when trying to get up to trouble in chemistry. Then I tried the ReAct agent (from @ShunyuYao12 ) quickly saw real molecules. This work eventually was public in GPT-4 technical report 2/8

The problem with LLM agents in science is that they must be judged in the lab. So I called @pschwllr – the best chemists I know, and the inventor of molecular transformers. We teamed-up and worked together on a plan to improve and test the agent. 3/8

Read 8 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Jun 6, 2023

@MehradAnsari

How can you learn to predict peptide properties without negative examples? This happens often when trying to analyze outputs from screening results. We explore various approaches in this new paper from @MehradAnsari. 1/4

biorxiv.org/content/10.110…

Peptide screening usually gives positive examples, which makes it difficult to train a classifier. Previous work has been done on this - including one-class SVM. We evaluate these and propose a modified algorithm built on "spies" 2/4

A spy is a positive example, which you relabel as a negative to help identify a decision boundary. Using a basic convolutional NN, we evaluate our method across multiple tasks where we happen to have negative and positive data. 3/4

Read 4 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Apr 12, 2023

@4everstudent95

How can you check if a molecule is present in a >10B dataset in 0.2 ms? With bloom filters! Checkout our preprint on bloom filters by @4everstudent95 1/4

Code: github.com/whitead/molblo…
Paper: arxiv.org/abs/2304.05386

Bloom filters are fast and can store ultra large chemical libraries in RAM, at the cost of a false positive rate of 0.005 (can tune this!) 2/4

In the paper, we compared fingerprints vs SMILES and found that bloom filters on SMILES are faster and have better performance. In fact, SMILES follows theoretical performance exactly 3/4

Read 4 tweets

Andrew White 🐦‍⬛

@andrewwhite01

Apr 12, 2023

@SamCox822

Our preprint on using GPT-4 as an agent with tools for chemistry is out! We call it ChemCrow. Working with @SamCox822, @drecmb @pschwllr, we developed a set of tools for synthesis/cond, safety, commercial availability, patents, paper-qa

arxiv.org/abs/2304.05376 1/5

@LangChainAI

We, unsurprisingly, found that GPT-4 with tools is much better than GPT-4 alone. Here it outlines a synthesis for atorvastatin complete with steps, an ingredient list, cost, and suppliers. We implement this with @LangChainAI (great library!)

One of the biggest surprises for us was that GPT-4 has trouble evaluating the completions! Comparing the two answers, GPT-4 as an evaluator ranks ChemCrow to be about the same, even though GPT-4 alone fails often. 3/5

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Andrew White 🐦‍⬛

Try unrolling a thread yourself!

More from @andrewwhite01

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Andrew White 🐦‍⬛

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!