The Tesla team discussed how they are using AI to crack Full Self Driving (FSD) at their Tesla AI Day event.
They introduced many cool things:
- HydraNets
- Dojo Processing Units
- Tesla bots
- So much more...
Here's a quick summary 🧵:
They introduced their single deep learning model architecture ("HydraNet") for feature extraction and transforming into a "vector space"
This includes multi-scale features from each of the 8 cameras, integrated with a transformer to attend to important features, incorporating kinematic features, processing in a spatiotemporal manner using a feature queue and spatial RNNs, all trained multi-task learning.
Planning and control of the car utilizes reinforcement learning-based approaches
Here is the entire pipeline put together:
Next, they discussed their labeling pipeline‚ which is all done in-house. The way they do this is by labeling directly in this "vector space"
They also use simulations to provide additional data:
The Tesla team also discussed their Dojo supercomputers, which are specialized supercomputers for machine learning!
While the hardware is still in development, they are developing exaflop-level servers!
Then, out of nowhere, @elonmusk introduces Tesla Bot!
The Tesla AI team is designing and building a humanoid bot to perform repetitive tasks, using the AI algorithms originally used to develop FSD.
I have only covered the tip of the iceberg!
Check out the recording here:
A quick clarification: the RL-based techniques are not used in production for planning and control yet but they are exploring it currently... Nonetheless, it is still very exciting and interesting!
This a diffusion model pipeline that goes beyond what AlphaFold2 did: predicting the structures of protein-molecule complexes containing DNA, RNA, ions, etc.
Google announces Med-Gemini, a family of Gemini models fine-tuned for medical tasks! 🔬
Achieves SOTA on 10 of the 14 benchmarks, spanning text, multimodal & long-context applications.
Surpasses GPT-4 on all benchmarks!
This paper is super exciting, let's dive in ↓
The team developed a variety of model variants. First let's talk about the models they developed for language tasks.
The finetuning dataset is quite similar to Med-PaLM2, except with one major difference:
self-training with search
(2/14)
The goal is to improve clinical reasoning and ability to use search results.
Synthetic chain-of-thought w/ and w/o search results in context are generated, incorrect preds are filtered out, the model is trained on those CoT, and then the synthetic CoT is regenerated
Before I continue, I want to mention this work was led by @RiversHaveWings, @StefanABaumann, @Birchlabs. @DanielZKaplan, @EnricoShippole were also valuable contributors. (2/11)
High-resolution image synthesis w/ diffusion is difficult without using multi-stage models (ex: latent diffusion). It's even more difficult for diffusion transformers due to O(n^2) scaling. So we want an easily scalable transformer arch for high-res image synthesis. (3/11)