Shane Gu Profile picture
Gemini 1.5 Pro post-training multilinguality lead @GoogleDeepMind Tokyo/SF. 🇯🇵-born 🇨🇳🇨🇦. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)

Oct 1, 2022, 7 tweets

Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/

- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/

- "generalist" conditional occupancy network: a single "big" network which outputs both voxels and semantics from images. Trained on LARGE dataset from auto labeling. Given where conditional/generative NeRF/OccNets are in academia (arxiv.org/abs/2209.10684), blown away by scale 3/

- lane prediction as image-to-text: used decoder-only Transformer for output "lanes" with a custom language. Best example I've seen of using discrete semantic tokens to describe rich continuous spatial (2D) information. Likely to generalize to 3D and other robot tasks. 4/

- 2 weeks to reconstruct "alterable" San Francisco in simulation: images below are fully generated by Unreal Engine. Everything (road texture, lane semantics, human/cars, weather, etc) is modifiable to generate additional training data. Only took two weeks to input SF. 5/

- Dojo (Tesla custom compute hardware): 50% of total Tesla compute is on auto labeling + OccNet. Dojo beats A100 GPUs with various problem-specific optimizations. PyTorch supported. Runs #stablediffusion. Probably the most exciting result of the night. 6/

- Elon said "Software 2.0" 7/

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling