Have been talking with people deploying foundational models/VLAs, + reading papers trying to increase real-world succes. Real world RL methods result in highest robust succes rates compared to any other approach.
It’s pretty simple, practical bottlenecks aren’t in the pre-training anymore, it’s in the post-training, mostly using real world RL.
When you try simulation, you’ll suffer from sim2real gap, even with DR you’ll find yourself tuning the model again with real world data
But tuning alone with real world data is not enough for 100% succes rates, recorded teleoperation data doesn’t contain enough OOD rollouts, so it fails during deployment because robustness is bad, not enough generalisation. But you can enforce robustness with RL.
So logically we try this cheaply:
SFT with mixture of sim and real rollouts from expert demos, then perform RL in simulation but use real-world rollouts during every batch gradient update to prevent catastrophic forgetting from sim-only training (RLinf-Co, Sim-and-Real Co)
But succes rates are nowhere close to 100% because the real-world changes during deployment: for robustness real-world RL is required, most promising approaches are currently:
1. Residual / adapter / token-based while freezing most of VLA: RL Token (from PI), PLD and iRe-VLA
2. Human-in-the-loop real-world online RL, let humans intervene/reset/correct rollouts and update the policy: RECAP (from pi0.6), ConRFT, VLAC, DAFT, Hi-ORS
3. Digital-twin-guided real-world RL, this executes the idea of real-to-sim-to-real: do exploration in the digital twin sim and online RL training in parallel: TwinRL-VLA
4. The most promising approach seems to be offline-to-online RL with human interventions and corrections across a fleet of robots instead of one, this feels like running vectorised Isaac Sim environments but instead of sim it’s real, and it works: LWD: Learning while Deploying
@threadreaderapp unroll
• • •
Missing some Tweet in this thread? You can try to
force a refresh