Update in question is “Upgraded modeling of lane geometry from dense rasters (‘bag of points’) to an autoregressive decoder that directly predicts and connects ‘vector space’ lanes point by point using a transformer neural network.”
As I interpret the notes, previous versions, the 3D geometry is based directly on the camera inputs, which generates excessive qty’s of 3D points.
This is now fed into a new neural net that can distill them into a more sparse-yet-adequate 3D model.
Importantly…
The new network is regressive, meaning that it can base predictions on past and current “knowledge.” Essentially, it has short-term memory and can make determinations based on info from a timeframe instead of solely from an instant moment.
Because the data is also distilled into a more concise set of points (vectors), further calculations to transform the geometry will be much faster (eliminating calculations on extraneous data). 🧠
• • •
Missing some Tweet in this thread? You can try to
force a refresh