phd candidate @CMU_Robotics. ms @berkeley_ai. summers @GoogleAI, @msftresearch, @aurora_inno, @nvidia, @spacex. no model is an island. also @gokul.dev on bsky.
Jul 15 • 7 tweets • 2 min read
Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…
Lead by @owenoertell & @zhan_wenhao, joint w/ @zstevenwu, @xkianteb, @WenSun1, @jasondeanlee.
If a project has got Wen, Owen, Wenhao, and Qwen on it, you know it's gotta be good😛.
Jun 19 • 9 tweets • 4 min read
It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵
You can access all the content here:
Course Website: interactive-learning-algos.github.io
Lecture Playlist: youtube.com/playlist?list=…
Scribe Notes "Book": interactive-learning-algos.github.io/assets/pdfs/af….
Homeworks / class competition material are also public!
Mar 4 • 17 tweets • 4 min read
1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!
If you'd like to avoid the bends, the TL;DR is that RL lets us filter down our search space to only those policies that are optimal for relatively simple verifiers.
📰: .
Joint w/ the all-star cast of @sanjibac, @WenSun1, @zstevenwu, and Drew Bagnell. [2/n]arxiv.org/abs/2503.01067
Jul 8, 2023 • 10 tweets • 3 min read
I'm rarely as excited about a paper as our #ICML2023 paper: we develop an algorithm for doing inverse reinforcement w/o an expensive RL inner loop, providing an *exponential* speedup. Works *extremely* well in practice. Joint work w/ @sanjibac, @zstevenwu, and Drew Bagnell. [1/n]
Check out for a video summary and https://t.co/VcriYEnN2u for the paper, code and key insights. [2/n] gokul.dev/filter/
Dec 7, 2021 • 7 tweets • 4 min read
One of my favorite parts of grad school is learning about all the awesome work my friends are doing. I thought I'd make a thread of some of it (most of them the first paper of a PhD!) that's coming out this week at #NeurIPS2021. Apologies in advance if I forgot some:
First up: An elegant regularization technique for stabilizing Q-functions by @alexlioralexli: proceedings.neurips.cc/paper/2021/fil…. I really like the idea of Fourier features and it was neat to see them applied to RL. The NTK-based analysis taught me a bunch as well.