Tweet

Tristan

12 Apr, 15 tweets, 3 min read

We recently got some insight into how Tesla is going to replace radar in the recent firmware updates + some nifty ML model techniques

⬇️ Thread

From the binaries we can see that they've added velocity and acceleration outputs. These predictions in addition to the existing xyz outputs give much of the same information that radar traditionally provides
(distance + velocity + acceleration).

For autosteer on city streets, you need to know the velocity and acceleration of cars in all directions but radar is only pointing forward. If it's accurate enough to make a left turn, radar is probably unnecessary for the most part.

How can a neural network figure out velocity and acceleration from static images you ask?

They can't!

They've recently switched to something that appears to be styled on an Recurrent Neural Network.

Net structure is unknown (LSTM?) but they're providing the net with a queue of the 15 most recent hidden states. Seems quite a bit easier to train than normal RNNs which need to learn to encode historical data and can have issues like vanishing gradients for longer time windows.

The velocity and acceleration predictions is new, by giving the last 15 frames (~1s) of data I'd expect you can train a highly accurate net to predict velocity + acceleration based off of the learned time series.

They've already been using these queue based RNNs with the normal position nets for a few months presumably to improve stability of the predictions.

This matches with the recent public statements from Tesla about new models training on video instead of static images.

To evaluate the performance compared to radar, I bet Tesla has run some feature importance techniques on the models and radar importance has probably dropped quite a bit with the new nets. See tools like captum.ai for more info.

I still think that radar is going to stick around for quite a while for highway usage since the current camera performance in rain and snow isn't great.

NoA often disables in mild rain. City streets might behave better since the relative rain speed is lower.

One other nifty trick they've recently added is a task to rectify the images before feeding them into the neural nets.

This is a common in classical CV applications so surprised it only popped up in the last couple of months.

docs.opencv.org/master/dc/dbb/…

This makes a lot of sense since it means that the nets don't need to learn the lens distortion. It also likely makes it a lot easier for the nets to correlate objects across multiple cameras since the movement is now much more linear.

For more background on LSTMs (Long Short-Term Memory) see towardsdatascience.com/illustrated-gu…

They're tricky to train because they need to encode history which is fed into future runs. The more times you pass the state, the more the earlier frames is diluted hence "vanishing gradients".

https://twitter.com/rice_fry/status/1381659099641745411

For more info on LSTMs/RNNs:

https://twitter.com/rice_fry/status/1381659099641745411

Using a queue like this is clever since it splits the "learning info about each frame" from "remembering it".

Each net just needs to learn how to encode all the relevant information from it's own frame and then the queue handles providing the history.

Smart summon appears to still use a traditional LSTM with a single hidden state (and no queue). I wonder if this new technique will make its way into smart summon to improve it's performance.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Tristan

Try unrolling a thread yourself!

More from @rice_fry

Tristan

Did Thread Reader help you today?

Like this author's thread?