Tristan Profile picture
Machine Learning + Reverse Engineering + Software + Security SWE @pytorch, tweets are personal opinions https://t.co/419A7MoFX7

Nov 24, 2021, 11 tweets

Curious what Tesla means by upreving their static obstacle neural nets?

Lets see how the Tesla FSD Beta 10.5 3D Voxel nets compare to the nets from two months ago.

The new captures are from the same area as the old ones so we can directly compare the outputs

1/N

This first example is a small pedestrian crosswalk sign in the middle of the road. It's about 1 foot wide so it should show up as 1 pixel in the nets.

Under the old nets it shows up as a large blob with an incorrect depth. Under the new nets it's much better.

Under the old nets the posts show up a huge blobs and disappears when the car gets close to it. The probabilities seem fairly consistent no matter how far they sign is away even though up close they should be more confident.

fn.lc/s/depthrender/…

Under 10.5 the post has a much more accurate depth up close and the post now shows up along side the car and behind it

The confidence seems to match reality much better as the predictions have a higher probability as the car gets closer

fn.lc/s/depthrender/…

Looking at an intersection near a park we can see that there's a huge difference in clarity.

It seems like Tesla has optimized the training data to better isolate the sidewalks from the ground so they more consistently show up

Old: fn.lc/s/depthrender/…

The light poles also have much more accurate depth. Before they would have the same issue of having the right width but show up as being 10+ ft deep

New: fn.lc/s/depthrender/…

I don't have an old capture of these cones to compare but they seem plenty distinct and good enough to drive on. The predictions are stable next to and behind the car

Of note is that the cones merge together into a single line further way from the car

Since I suspect this training data is automatically generated, the repetitive colors and patterns of the cones may be confusing their offline algorithm into thinking it's one solid object.

I'm not sure it matters for driving given their close spacing

fn.lc/s/depthrender/…

Here's an example of making a right turn into a narrow road with hard curbs on both sides. The car is outputting pretty accurate estimates of the curbs through the tight corner

Seems like my added car model is a tad too far forward compared to reality

fn.lc/s/depthrender/…

It's pretty clear that there's been some big though incremental improvements to these nets.

The recent FSD patch notes mentioning the increased number of clips as well as improvements to the training data generation (autolabeler?) seems to be paying off

It's not clear if this is part of the autolabeler given there's no real labels here but if it's a shared system for managing the clips it may benefit both

I'm also curious if there's been any model architectural changes helping though the patch notes haven't mentioned that

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling