Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Bradford Ferguson

@bradsferguson

Oct 25 • 31 tweets • 10 min read • Read on X

Scrolly

Tesla's self-driving AI solves an insane problem: compress 2 BILLION input tokens (7-8 cameras × 5MP × 30 seconds) down to just 2 output tokens—steering and acceleration. Here's how they actually made it work 🧵

The system uses a single large end-to-end neural network. Pixels and sensor data go in → steering and acceleration come out. No explicit perception modules. Raw video streams directly to actions. This approach has been powering Tesla's FSD for years now.

Why end-to-end? Because codifying human values in code is nearly impossible. When should you brake? How hard? It depends on speed, comfort, situation—no "one objective value." You can't write these alignment preferences in explicit rules.

Example: puddle on a bidirectional road. Do you drive through it or enter the oncoming lane? How do you code "pain of puddle" vs "risk of oncoming traffic"? You can't. But looking at the scene end-to-end, the right choice becomes obvious.

Watch this: chickens crossing the road. The car waits patiently for the last one to cross. No collision risk, but it understands their INTENT to cross—it's just the right thing to do. This is intelligence beyond obstacle detection.

Same system, different scenario: geese standing still on the road (not crossing). The car changes its mind, backs up, and goes around them. The engineer's reaction: "This is crazy." Different context → different behavior. Hard to write in explicit code.

With modular systems, the perception→planning interface is "ill-defined" and "very lossy"—critical info gets lost in translation. End-to-end neural networks flow gradients from pixels directly to actions. Nothing gets lost.

Tesla's core belief: "This is the right path to solving robotics as opposed to modular brittle systems." They're making a fundamental bet that end-to-end neural networks are how you build intelligent machines at scale.

But this approach has massive challenges. Challenge #1: the curse of dimensionality.

7-8 cameras × 5 megapixels × 30 seconds = over 2 BILLION input tokens.

The model must compress that down to just 2 output tokens: steering and acceleration.

Tesla's advantage: access to the "Niagara Falls of data" from their massive fleet. You don't want spurious correlations—you want the RIGHT correlations explaining why those 2 output tokens are correct. Data is their competitive moat.

The fleet has collects 500 YEARS of driving data everyday—more than they can even store. But they don't use it all. They refine it down to the essential scenarios that cover the full spectrum of driving. Quality over quantity.

How? Triggers catch rare scenarios when drivers encounter them naturally—weird intersections, animals, construction zones. "You can't stage this easily" because it requires real-world state space. The fleet gives them unique access to corner cases.

Result: the car ahead spins out. Tesla's system predicts it will hit the barrier and bounce back—a SECOND-ORDER physics effect. It starts braking at 4 m/s² during the initial spin, before the collision. "It did not wait." Proactively safe.

Challenge #2: how do you debug an end-to-end system?

The same model can be prompted to predict 3D occupancy, objects, traffic lights, road boundaries, even plain language explanations. Just because it's end-to-end doesn't mean it's a black box.

Traditional Gaussian splatting breaks down on novel views and takes tens of minutes. Tesla's variant is "ridiculously fast," works better with limited camera views, and maintains structure. No COLMAP needed. It's used to debug "is it safely avoiding obstacles?"

The same model can produce reasoning tokens when needed—explaining decisions in plain language. It doesn't reason for every action (latency cost), but "wherever it's needed, it could reason longer" to produce the right answer.

Challenge #3: evaluation—"the most difficult of these three problems."

Open-loop performance can look amazing but not translate to real driving. Random fleet samples = boring highway driving. You need balanced eval sets. "Extremely important but very tedious work."

The solution: a world model simulator. It generates EIGHT simultaneous 5-megapixel video streams (front, sides, rear) for over a minute—all action-conditioned, all consistent. Vehicle rims, traffic lights, everything coherent. From one neural network.

How do you train it? "You don't need optimal driving... any kind of trash driving is good enough." State-action pairs collected for free from the fleet. It needs to simulate edge cases, not perfect driving. Brilliant inversion of the problem.

With lower test-time compute, it runs in REAL-TIME. You can steer, brake, accelerate through a fully generated 8-camera world at 5MP. It responds like reality. Videos run 6+ minutes with consistent generation. "Quite powerful" for evaluation + RL training.

Take a year-old failure, replay it with your latest network, see if it's fixed. Example: was too close to a pedestrian → new model offsets way earlier. You don't need new real-world miles to verify improvements on known issues.

Or inject adversarial events: condition one vehicle to cut across your path while keeping the rest of the scene consistent. Systematically test corner cases without real-world danger. Synthetic safety testing at scale.

The world model enables closed-loop reinforcement learning: "let the car drive and verify that it doesn't collide with anything for a very long time." Train and improve the policy in simulation, deploy to reality. No real-world risk during training.

Key insight from Q&A: "The main premise of end-to-end is that gradients must flow end-to-end." You can have auxiliary outputs, different architectures, various output spaces—everything else is empirical. Gradient flow is the ONE non-negotiable rule.

Tesla does use sensor-specific tokenization for efficiency. End-to-end doesn't mean no modularity—it means gradients flow through everything. "Some level of modularity still" exists. The architecture is flexible as long as learning is end-to-end.

Another key distinction: perception can use open-loop eval (your prediction doesn't affect the scenario). But action needs closed-loop eval (your prediction affects the environment). Use the right evaluation tool for what you're measuring.

Where this is all heading: Tesla launched their robotaxi service in Austin and Bay Area (June-July). In Austin below 40 mph, cars operate with no one in the passenger seat. Not a demo—a service you can hail. Cameras, neural networks, real-world deployment.

Next: Cybercab. Purpose-built robotaxi, 2 seats. It will have "the lowest cost of transportation across even public transportation." All powered by these same neural networks. The approach scales across vehicle platforms, locations, and weather.

And it's not just cars. "The same technology we developed for self-driving transfers most seamlessly to other forms of robots too." The world model works for Optimus humanoid robots. Same neural network, just add Optimus data—it generalizes across form factors.

Tesla is all in on robotics. The entire company is focused on producing intelligent useful large scale robots for helping everyone in the world."

This isn't just about self-driving cars. It's about solving robotics, end-to-end. 🤖

The reason I created this thread is because most people will read something short but not watch a 28-minute video. Here's the source info:

Video: Tesla ICCV 2025 Foundational Model for FSD - Ashok Elluswamy
Date: October 22, 2025

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @bradsferguson

Bradford Ferguson

@bradsferguson

Jun 27

Three days with Tesla Robotaxi. 62 total rides. Here’s what actually happened when I put the autonomous future to the test.

Most surprising discovery: I saw more human driving errors than Robotaxi mistakes. Humans got in wrong lanes 5 times. The Robotaxi never made a bad lane choice once.

Instant ride matching is underrated. No waiting for driver approval or cycling through options. Just immediate connection. Makes the whole experience seamless.

Read 17 tweets

Bradford Ferguson

@bradsferguson

Dec 18, 2024

https://twitter.com/ray4tesla/status/1868836843929977256

There are many ways to sell covered calls. Many of them are the wrong way to go depending on the stock.

Because we care about $TSLA bulls keeping their shares.

We made two videos about $TSLA covered call selling⏬️

https://twitter.com/ray4tesla/status/1868836843929977256

Most bulls fear losing shares in a revaluation scenario (and $TSLA moons). So we asked ourselves, how did a simple covered call strategy work in 2020 when the stock went up 730%?

Answer:

Covered calls can generate a lot of income in a week or month. Is it wise to go the more aggressive route with less margin of safety like $TSLY uses?

Answer:

Read 4 tweets

Bradford Ferguson

@bradsferguson

Sep 7, 2024

In a direct shot against Tesla, Marco Rubio is trying to get CATL batteries banned in 🇺🇸.

The use of batteries is not a national security risk; batteries enhance national security by making us less reliant on oil.

Pics of his letter 👇

in the following 3 tweets a TLDR summary

‘The U.S. Department of Defense must act now. CATL, a Chinese battery company, threatens our national security. It has ties to the Chinese Communist Party and military. We need to put CATL on the Section 1260H List. This list names Chinese military companies working in the U.S.’

‘CATL's success comes from Chinese government support. Its founder backs the Communist Party. CATL works with other blacklisted companies. It may power Chinese military bases and submarines. CATL is like Huawei, but for batteries instead of phones.’

Read 6 tweets

Bradford Ferguson

@bradsferguson

Aug 12, 2024

https://twitter.com/SawyerMerritt/status/1823087544621543856

The Cybertruck Foundation Series was an opportunistic offering that Tesla made so people could pay more to be among the first to have a Cybertruck.

Tesla has gone through the whole list offering the CT that was originally $50k for $100k. 99% refused as expected.

https://twitter.com/SawyerMerritt/status/1823087544621543856

Now there is no line for Foundation Series and Tesla is asking, "Anyone else? Final chance!"

Next, Tesla will lower the price to $80k for the 2 motor variant and go through the reservation list again from top to bottom. If you're number 2 million in line, you will have to wait

... until everyone else says no at $80k for the 2 motor variant.

THEN, Tesla will lower the 2 motor Cybertruck to something like $62k (toward its final price) and go through the whole list again.

Read 4 tweets

Bradford Ferguson

@bradsferguson

Sep 8, 2023

https://twitter.com/matchasmmatt/status/1700135090276257814

🧬 Tesla Powerwall 3 is a substantial evolution from Powerwall 2. Smaller, lighter, more powerful, longer total life. $TSLA

Here are the main changes 🧵 1/6

📸 Andy92782 from TMC via @SawyerMerritt

https://twitter.com/matchasmmatt/status/1700135090276257814

🫃🏻 Smaller size and lighter weight.

The Powerwall 3 is five inches shorter and 25% slimmer than the Powerwall 2. It also weighs about 100 pounds less. This makes it easier to install and transport.

💪Higher peak power output.

The Powerwall 3 has a peak power output of 11.5 kW, up from 5 kW for the Powerwall 2. This means that it can power more devices during a power outage.

Read 6 tweets

Bradford Ferguson

@bradsferguson

Mar 15, 2023

https://twitter.com/alexlavelle5/status/1636036238623887362

📚 The Innovator's Dilemma explains why legacy auto struggled to see or stop Tesla's disruption. Tesla's approach shook the industry!🚗💨 1/8

(written by Chat GPT as tweets with emojis & without hashtags)

https://twitter.com/alexlavelle5/status/1636036238623887362

🔧Legacy auto focused on incremental improvements, missing disruptive electric vehicles potential.🔋 2/

🎯Tesla targeted niche markets, flying under radar until it was too late for incumbents.😲 3/

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Bradford Ferguson

Try unrolling a thread yourself!

More from @bradsferguson

Bradford Ferguson

Bradford Ferguson

Bradford Ferguson

Bradford Ferguson

Bradford Ferguson

Bradford Ferguson

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!