Bradford Ferguson Profile picture
Oct 25 31 tweets 10 min read Read on X
Tesla's self-driving AI solves an insane problem: compress 2 BILLION input tokens (7-8 cameras × 5MP × 30 seconds) down to just 2 output tokens—steering and acceleration. Here's how they actually made it work 🧵 Image
The system uses a single large end-to-end neural network. Pixels and sensor data go in → steering and acceleration come out. No explicit perception modules. Raw video streams directly to actions. This approach has been powering Tesla's FSD for years now. Image
Why end-to-end? Because codifying human values in code is nearly impossible. When should you brake? How hard? It depends on speed, comfort, situation—no "one objective value." You can't write these alignment preferences in explicit rules. Image
Example: puddle on a bidirectional road. Do you drive through it or enter the oncoming lane? How do you code "pain of puddle" vs "risk of oncoming traffic"? You can't. But looking at the scene end-to-end, the right choice becomes obvious. Image
Watch this: chickens crossing the road. The car waits patiently for the last one to cross. No collision risk, but it understands their INTENT to cross—it's just the right thing to do. This is intelligence beyond obstacle detection. Image
Same system, different scenario: geese standing still on the road (not crossing). The car changes its mind, backs up, and goes around them. The engineer's reaction: "This is crazy." Different context → different behavior. Hard to write in explicit code. Image
With modular systems, the perception→planning interface is "ill-defined" and "very lossy"—critical info gets lost in translation. End-to-end neural networks flow gradients from pixels directly to actions. Nothing gets lost. Image
Tesla's core belief: "This is the right path to solving robotics as opposed to modular brittle systems." They're making a fundamental bet that end-to-end neural networks are how you build intelligent machines at scale. Image
But this approach has massive challenges. Challenge #1: the curse of dimensionality.

7-8 cameras × 5 megapixels × 30 seconds = over 2 BILLION input tokens.

The model must compress that down to just 2 output tokens: steering and acceleration. Image
Tesla's advantage: access to the "Niagara Falls of data" from their massive fleet. You don't want spurious correlations—you want the RIGHT correlations explaining why those 2 output tokens are correct. Data is their competitive moat. Image
The fleet has collects 500 YEARS of driving data everyday—more than they can even store. But they don't use it all. They refine it down to the essential scenarios that cover the full spectrum of driving. Quality over quantity. Image
How? Triggers catch rare scenarios when drivers encounter them naturally—weird intersections, animals, construction zones. "You can't stage this easily" because it requires real-world state space. The fleet gives them unique access to corner cases. Image
Result: the car ahead spins out. Tesla's system predicts it will hit the barrier and bounce back—a SECOND-ORDER physics effect. It starts braking at 4 m/s² during the initial spin, before the collision. "It did not wait." Proactively safe. Image
Challenge #2: how do you debug an end-to-end system?

The same model can be prompted to predict 3D occupancy, objects, traffic lights, road boundaries, even plain language explanations. Just because it's end-to-end doesn't mean it's a black box. Image
Traditional Gaussian splatting breaks down on novel views and takes tens of minutes. Tesla's variant is "ridiculously fast," works better with limited camera views, and maintains structure. No COLMAP needed. It's used to debug "is it safely avoiding obstacles?" Image
The same model can produce reasoning tokens when needed—explaining decisions in plain language. It doesn't reason for every action (latency cost), but "wherever it's needed, it could reason longer" to produce the right answer. Image
Challenge #3: evaluation—"the most difficult of these three problems."

Open-loop performance can look amazing but not translate to real driving. Random fleet samples = boring highway driving. You need balanced eval sets. "Extremely important but very tedious work." Image
The solution: a world model simulator. It generates EIGHT simultaneous 5-megapixel video streams (front, sides, rear) for over a minute—all action-conditioned, all consistent. Vehicle rims, traffic lights, everything coherent. From one neural network. Image
How do you train it? "You don't need optimal driving... any kind of trash driving is good enough." State-action pairs collected for free from the fleet. It needs to simulate edge cases, not perfect driving. Brilliant inversion of the problem. Image
With lower test-time compute, it runs in REAL-TIME. You can steer, brake, accelerate through a fully generated 8-camera world at 5MP. It responds like reality. Videos run 6+ minutes with consistent generation. "Quite powerful" for evaluation + RL training. Image
Take a year-old failure, replay it with your latest network, see if it's fixed. Example: was too close to a pedestrian → new model offsets way earlier. You don't need new real-world miles to verify improvements on known issues. Image
Or inject adversarial events: condition one vehicle to cut across your path while keeping the rest of the scene consistent. Systematically test corner cases without real-world danger. Synthetic safety testing at scale. Image
The world model enables closed-loop reinforcement learning: "let the car drive and verify that it doesn't collide with anything for a very long time." Train and improve the policy in simulation, deploy to reality. No real-world risk during training. Image
Key insight from Q&A: "The main premise of end-to-end is that gradients must flow end-to-end." You can have auxiliary outputs, different architectures, various output spaces—everything else is empirical. Gradient flow is the ONE non-negotiable rule.
Tesla does use sensor-specific tokenization for efficiency. End-to-end doesn't mean no modularity—it means gradients flow through everything. "Some level of modularity still" exists. The architecture is flexible as long as learning is end-to-end.
Another key distinction: perception can use open-loop eval (your prediction doesn't affect the scenario). But action needs closed-loop eval (your prediction affects the environment). Use the right evaluation tool for what you're measuring.
Where this is all heading: Tesla launched their robotaxi service in Austin and Bay Area (June-July). In Austin below 40 mph, cars operate with no one in the passenger seat. Not a demo—a service you can hail. Cameras, neural networks, real-world deployment. Image
Next: Cybercab. Purpose-built robotaxi, 2 seats. It will have "the lowest cost of transportation across even public transportation." All powered by these same neural networks. The approach scales across vehicle platforms, locations, and weather. Image
And it's not just cars. "The same technology we developed for self-driving transfers most seamlessly to other forms of robots too." The world model works for Optimus humanoid robots. Same neural network, just add Optimus data—it generalizes across form factors. Image
Tesla is all in on robotics. The entire company is focused on producing intelligent useful large scale robots for helping everyone in the world."

This isn't just about self-driving cars. It's about solving robotics, end-to-end. 🤖 Image
The reason I created this thread is because most people will read something short but not watch a 28-minute video. Here's the source info:

Video: Tesla ICCV 2025 Foundational Model for FSD - Ashok Elluswamy
Date: October 22, 2025

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bradford Ferguson

Bradford Ferguson Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @bradsferguson

Jun 27
Three days with Tesla Robotaxi. 62 total rides. Here’s what actually happened when I put the autonomous future to the test. Image
Most surprising discovery: I saw more human driving errors than Robotaxi mistakes. Humans got in wrong lanes 5 times. The Robotaxi never made a bad lane choice once.
Instant ride matching is underrated. No waiting for driver approval or cycling through options. Just immediate connection. Makes the whole experience seamless.
Read 17 tweets
Dec 18, 2024
There are many ways to sell covered calls. Many of them are the wrong way to go depending on the stock.

Because we care about $TSLA bulls keeping their shares.

We made two videos about $TSLA covered call selling⏬️
Most bulls fear losing shares in a revaluation scenario (and $TSLA moons). So we asked ourselves, how did a simple covered call strategy work in 2020 when the stock went up 730%?

Answer:
Covered calls can generate a lot of income in a week or month. Is it wise to go the more aggressive route with less margin of safety like $TSLY uses?

Answer:
Read 4 tweets
Sep 7, 2024
In a direct shot against Tesla, Marco Rubio is trying to get CATL batteries banned in 🇺🇸.

The use of batteries is not a national security risk; batteries enhance national security by making us less reliant on oil.

Pics of his letter 👇

in the following 3 tweets a TLDR summary


Image
Image
Image
Image
‘The U.S. Department of Defense must act now. CATL, a Chinese battery company, threatens our national security. It has ties to the Chinese Communist Party and military. We need to put CATL on the Section 1260H List. This list names Chinese military companies working in the U.S.’
‘CATL's success comes from Chinese government support. Its founder backs the Communist Party. CATL works with other blacklisted companies. It may power Chinese military bases and submarines. CATL is like Huawei, but for batteries instead of phones.’
Read 6 tweets
Aug 12, 2024
The Cybertruck Foundation Series was an opportunistic offering that Tesla made so people could pay more to be among the first to have a Cybertruck.

Tesla has gone through the whole list offering the CT that was originally $50k for $100k. 99% refused as expected.
Now there is no line for Foundation Series and Tesla is asking, "Anyone else? Final chance!"

Next, Tesla will lower the price to $80k for the 2 motor variant and go through the reservation list again from top to bottom. If you're number 2 million in line, you will have to wait
... until everyone else says no at $80k for the 2 motor variant.

THEN, Tesla will lower the 2 motor Cybertruck to something like $62k (toward its final price) and go through the whole list again.
Read 4 tweets
Sep 8, 2023
🧬 Tesla Powerwall 3 is a substantial evolution from Powerwall 2. Smaller, lighter, more powerful, longer total life. $TSLA

Here are the main changes 🧵 1/6

📸 Andy92782 from TMC via @SawyerMerritt

Image
Image
🫃🏻 Smaller size and lighter weight.

The Powerwall 3 is five inches shorter and 25% slimmer than the Powerwall 2. It also weighs about 100 pounds less. This makes it easier to install and transport.
💪Higher peak power output.

The Powerwall 3 has a peak power output of 11.5 kW, up from 5 kW for the Powerwall 2. This means that it can power more devices during a power outage.
Read 6 tweets
Mar 15, 2023
📚 The Innovator's Dilemma explains why legacy auto struggled to see or stop Tesla's disruption. Tesla's approach shook the industry!🚗💨 1/8

(written by Chat GPT as tweets with emojis & without hashtags)
🔧Legacy auto focused on incremental improvements, missing disruptive electric vehicles potential.🔋 2/
🎯Tesla targeted niche markets, flying under radar until it was too late for incumbents.😲 3/
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(