Michael Hla Profile picture
bio + cs | prev @harvard @shv
Apr 2 11 tweets 5 min read
I trained an LLM from scratch on pre-1900 text to see if it could come up with quantum mechanics and relativity.

While the model is too small to do meaningful reasoning, it has glimpses of intuition.

When given observations from past landmark experiments, the model can declare that “light is made up of definite quantities of energy” and even suggest that gravity and acceleration are locally equivalent.

I’m releasing the dataset + models and leave this as an open problem to the research community.

I also include what this project has taught me about intelligence in a mini essay linked below.

🧵(1/n) A few weeks ago, Demis Hassabis proposed a straightforward experiment to prove that our current methods are indeed sufficient to achieve AGI:

Pretrain an LLM on all text before a cutoff date and see if it can come up with general relativity.

If that model could come to the same conclusions that the great scientists of the past did, it would be strong evidence that our models can do meaningful out of distribution reasoning.

I decided to take on this “Einstein test for AGI”, testing the model for conceptual understanding of quantum mechanics and relativity given surprising observations from landmark experiments.
Mar 7, 2025 14 tweets 8 min read
I taught an LLM to optimize proteins. It proposed a better carbon capture enzyme.

Introducing Pro-1, an 8b param reasoning model trained using GRPO towards a physics based reward function for protein stability.

It takes in a protein sequence + text description + previous experimental results, reasons over the information given in natural language, and proposes modifications to improve the stability of the given sequence.

🧵(1/n) Protein instability is a common problem in biology where proteins unfold in harsh environments, such as higher temperatures or acidities. Instability can hinder applications in drug development, biomanufacturing, and synthetic biology, where stable proteins are crucial for therapeutic efficacy, industrial enzyme performance, and engineered protein designs.

Wet lab based directed evolution works well, but is incredibly resource intensive and cannot possibly explore sequence space efficiently.

For the average 100 amino acid sequence, there are ~10^48 times as many possibilities as there are atoms in the observable universe.

Fortunately, many researchers build intuition about a variety of properties when designing their protein sequences and can instinctively optimize their sequence towards some desired property. For example, after the discovery of CRISPR, scientists were struggling to have the CRISPR complex enter the nuclei of mammalian cells. It wasn’t until Feng Zheng’s lab had the idea to attach a nuclear localization sequence that CRISPR could be successfully delivered in human cells.

This type of intuition is built over years of understanding scientific literature and experimentation.

But what if we could build this intuition in a language model by training it towards a physics based reward function?