Our community already shipped 100+ environments to the Environment Hub
Help us accelerate, with compute, a stipend, and support from our research team
The RL Residency gives you:
— Compute for experiments
— A stipend
— Hands-on support from our internal research team
Who should apply?
— Grad students with research ideas
— Independent builders & hackers
— Part time researchers exploring novel RL environments and evals
If you’ve wanted to build environments but lacked compute or support - this is for you
Some moonshot environments and evals we’d be especially excited about:
— Robust code-quality evaluations for agentic software engineering
— Evaluating usage of filesystems and memory for long-running tasks
— Adaptive coherent instruction-following for realistic multi-turn interactions
— Generative generalist reward models with process critiques
— Harness and task design for machine learning, such as:
—— Environments for NanoGPT speedrun optimizations
—— Terminal-friendly data visualization
—— Research plan generation, with recent notable papers as golden targets
Some highlights already live on the hub:
— KernelBench for GPU kernel generation
— DeepCoder coding problems with executable verification in sandbox
— StepFun-Prover for formal theorem proving in Lean4
— BrowseComp for agentic web research
— HUD 2048 variants, GPQA, AIME 2025, ARC-AGI with tools, and more
Each one expands the frontier of what open models can learn and be evaluated on
RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down
We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
Environments are where agents learn.
They define the world, rules, and feedback loop of state → action → reward. Everything from coding/math tasks to games and multi-turn dialogue evals can be thought of as environments. Without them, RL is just math with nothing to interact with.
This is why environments are pivotal to the next wave of AI progress.
But while big labs are spending millions buying and privatizing RL environments, open-source has no comparable way to crowd-source them at scale.
We’re building the platform and infrastructure to change that.
Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run.
Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks.
Contribute towards AGI via open, permissionless compute.
Planetary-Scale Inference
Our peer-to-peer decentralized inference stack moves into production, enabling everyone—from consumer GPUs to hyperscale clusters—to contribute meaningfully towards open-source AI progress.
Pipeline Parallelism
No single GPU holds the full model - each handles a stage, streaming activations forward. This lets smaller GPUs run large models like DeepSeek-R1. Hidden states pass stage to stage; the final GPU decodes a token, sends it back, and the cycle continues.
To train a model with reinforcement learning in a fully decentralized setting using community-contributed GPUs, we open-source several novel infrastructure components.
PRIME-RL: A fully asynchronous reinforcement learning framework designed for decentralized training. Decoupling of rollout generation, model training, and weight broadcasting enables training across heterogeneous, unreliable networks.
The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless.
Scaling towards frontier reasoning across coding, math and science.
INTELLECT-2 brings decentralized training into the inference-time compute era:
• Fully async, decentralized reinforcement learning
• Eliminating communication overhead
• Scalable across heterogeneous GPUs worldwide
Over the past months, we’ve built the full open-source stack to enable INTELLECT-2:
• PRIME-RL: fully async decentralized RL
• GENESYS & SYNTHETIC-1: crowdsourced tasks & verifiers for RL
• TOPLOC validation: verifiable inference with low overhead
• Protocol Testnet: global AI coordination infrastructure
Introducing SYNTHETIC-1: Collaboratively generating the largest synthetic dataset of verified reasoning traces for math, coding and science using DeepSeek-R1.
Join us to contribute compute towards state-of-the-art open reasoning models.
Today, we release:
- SYNTHETIC-1: 1.4 million high-quality tasks & verifiers
- Public synthetic data run - allowing anyone to contribute compute
- GENESYS: open, extendable synthetic data generation framework + call for crowdsourcing tasks & verifiers
Our open reproduction & scaling of R1 will proceed in two steps, mirroring the DeepSeek-R1 approach: 1. Generate verified reasoning data & train SFT model on this cold-start data 2. Globally distributed reinforcement learning with verifiable rewards
Today, we release TOPLOC: A Locality Sensitive Hashing Scheme for Verifiable Inference
- Detects modifications to models, prompts, or precision
- Robust across GPU types, tensor parallel configurations and attention kernels
- Up to 100× faster validation than generation
- Reduces memory overhead of proofs by 1000×
Building the foundation for decentralized, verifiable compute protocols.
The Problem: Trust in LLM Inference
In a peer-to-peer setting, ensuring honest behavior among providers requires detecting and penalizing dishonest ones. Providers often make changes, such as:
- Lowering precision
- Compressing KVCache
- Altering model weights or prompts
TOPLOC encodes key features of the last hidden states into a compact, verifiable proof.
- Providers commit the top-k values of the last hidden states
- Verifiers use prefill to process commits, enabling much faster validation than the original generation