Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Brendan Hogan

@brendanh0gan

Aug 24 • 7 tweets • 2 min read • Read on X

just pushed my first multi-turn RL environment to @PrimeIntellect

the setup: the model gets the story title + question from QuALITY (long stories, multiple-choice questions).

tts only tool: agentic RAG search over the story.

this is an idea I have been toying with for a while but didn’t get around to doing. I had a paper last year about a twist on a RAG method and primarily experimented on this dataset.

i really like this dataset; it’s sort of harder-to-read short stories, and the questions really require (imo) a good and subtle understanding of the paper.

so I liked the idea of building an agentic RAG system over this dataset - each story gets chunked up and embedded using OpenAI’s embeddings - then the agent gets to choose the query to embed and search.

the chunks are very small, so it’s a pretty difficult task. But I think learning here would require really reasoning about the question and the structure of this kind of writing.

thanks to @PrimeIntellect for building this and @willccbb for the invite! I think this is such an incredible initiative that can go in so many exciting directions. Looking forward to publishing a lot more RL environments and building agi together :) !

Original QuALITY paper: arxiv.org/abs/2112.08608

My earlier RAG paper: arxiv.org/abs/2409.15566

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @brendanh0gan

Brendan Hogan

@brendanh0gan

Aug 13

introducing qqWen: our fully open-sourced project (code+weights+data+detailed technical report) for full-stack finetuning (pretrain+SFT+RL) a series of models (1.5b, 3b, 7b, 14b & 32b) for a niche financial programming language called Q

All details below!

Links:

Technical Report: arxiv.org/abs/2508.06813

Models +Data on HuggingFace: huggingface.co/collections/mo…

Full Code: github.com/morganstanley/…

Note for Q Practitioners:

our SFT dataset/benchmark is made from leetcode problems, which might not reflect how Q is really used.

for general Q purposes, the pretrained models might be better than the fully fine-tuned ones

Read 26 tweets

Brendan Hogan

@brendanh0gan

Jul 11

https://twitter.com/brendanh0gan/status/1940567812533375104

doing this now for my debate framework: gpt4.1 vs gpt4.1 advised by qwen 3B

gpt4.1 w qwens advice debates itself in elo/tournament style to get advantage

advantage is used to grpo qwen to give better advice

you can fine tune api models with rl'd context

https://twitter.com/brendanh0gan/status/1940567812533375104

code: github.com/brendanhogan/D…

Again I really like this idea - for most practical agentic work I have done, you almost always just want to use a big API model - it works the best, and is quickest to get a good prototype

and training a big model is infeasible often

Read 9 tweets

Brendan Hogan

@brendanh0gan

Jul 3

other idea - if you assume it’s an open-weights model, can you learn an embedding-space context/prompt that improves performance?

I use/train a simple 3-layer network: it predicts from the last embedding of the prompt to a new embedding which is then fed into the frozen LLM

code: github.com/brendanhogan/D…

the predicted context embedding is fed into the frozen network, which (with sampling) generates reasoning chains as normal, which then get scored, and the gradient is computed in the normal way

Read 6 tweets

Brendan Hogan

@brendanh0gan

May 23

introducing: picoDeepResearch

multi-turn tool use + soft rewards + self-play + GRPO

You define the arena (report prompts + judging principles)

the model generates reports, uses tools (web search), then competes in round-robin battles judged by an LLM

winner gets the gradient

Code:

all still just pytorch, no vLLM/TRL/etc

inspired by OpenAI’s Deep Research, but made “pico”, just enough to run real experiments, fine-tune real models, and build intuition

these results were using qwen3 -14Bgithub.com/brendanhogan/p…

im particularly excited about this project - the other ones felt fun, but exploratory - this feels like pulling everything together into a single framework, and to produce an end model that could be really useful.

Read 9 tweets

Brendan Hogan

@brendanh0gan

May 15

new project - training a vLLM to solve CAPTCHAs with rl (grpo on F1 score).

introduced a “tool” for click_screen(x, y).

dataset is from cityscapes, F1 goes from 0.11 to ~0.78. details below

code: github.com/brendanhogan/D…

I built the dataset by taking Cityscapes + segmentation masks, gridding each image, and labeling any square as positive if >10% of its pixels were cars or motorcycles.

Read 6 tweets

Brendan Hogan

@brendanh0gan

Apr 27

i added a basic implementation of deepseek’s grm/spct paper to the debate framework - just many rounds of principles/critiques for the scoring

similar early win rate vs gpt-4o-mini. and anecdotally, the arguments read much better and are less reward hacky to me. gh below

Github:

this code is very much a work in progress - its pretty hard coded for the debate framework rngithub.com/brendanhogan/D…

also a lot left to do experimentally - including just letting the first run play out to 300+ steps to see what happens

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Brendan Hogan

Try unrolling a thread yourself!

More from @brendanh0gan

Brendan Hogan

Brendan Hogan

Brendan Hogan

Brendan Hogan

Brendan Hogan

Brendan Hogan

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!