The Tesla team discussed how they are using AI to crack Full Self Driving (FSD) at their Tesla AI Day event.
They introduced many cool things:
- HydraNets
- Dojo Processing Units
- Tesla bots
- So much more...
Here's a quick summary 🧵:
They introduced their single deep learning model architecture ("HydraNet") for feature extraction and transforming into a "vector space"
This includes multi-scale features from each of the 8 cameras, integrated with a transformer to attend to important features, incorporating kinematic features, processing in a spatiotemporal manner using a feature queue and spatial RNNs, all trained multi-task learning.
Planning and control of the car utilizes reinforcement learning-based approaches
Here is the entire pipeline put together:
Next, they discussed their labeling pipeline‚ which is all done in-house. The way they do this is by labeling directly in this "vector space"
They also use simulations to provide additional data:
The Tesla team also discussed their Dojo supercomputers, which are specialized supercomputers for machine learning!
While the hardware is still in development, they are developing exaflop-level servers!
Then, out of nowhere, @elonmusk introduces Tesla Bot!
The Tesla AI team is designing and building a humanoid bot to perform repetitive tasks, using the AI algorithms originally used to develop FSD.
I have only covered the tip of the iceberg!
Check out the recording here:
A quick clarification: the RL-based techniques are not used in production for planning and control yet but they are exploring it currently... Nonetheless, it is still very exciting and interesting!
A new startup, Inception Labs, has released Mercury Coder, "the first commercial-scale diffusion large language model"
It's 5-10x faster than current gen LLMs, providing high-quality responses at low costs.
And you can try it now!
The performance is similar to small frontier models while achieving a throughput of ~1000 tokens/sec... on H100s! Reaching this level of throughput for autoregressive LLMs typically requires specialized chips.
It's currently tied for second place on Copilot Arena!
Cleo was an account on Math Stack Exchange that was infamous for dropping the answer to the most difficult integrals with no explanation...
often mere minutes after the question was asked!!
For years, no one knew who Cleo was, UNTIL NOW!
People noticed that the same few people were interacting with Cleo (asking the questions Cleo answered, commenting, etc.), a couple of them only active at the same time as Cleo as well.
People were wondering maybe someone is controlling all these accounts as alts
One of the accounts, Laila Podlesny, had an email address associated with it, and by trying to fake log into the Gmail and obtaining the backup recovery email, someone figured out that Vladimir Reshetnikov was in control of Laila Podlesny.
Based on other ineractions from Vladimir on Math.SE, it seemed likely he controlled Cleo, Laila, and couple other accounts as well.
This a diffusion model pipeline that goes beyond what AlphaFold2 did: predicting the structures of protein-molecule complexes containing DNA, RNA, ions, etc.
Google announces Med-Gemini, a family of Gemini models fine-tuned for medical tasks! 🔬
Achieves SOTA on 10 of the 14 benchmarks, spanning text, multimodal & long-context applications.
Surpasses GPT-4 on all benchmarks!
This paper is super exciting, let's dive in ↓
The team developed a variety of model variants. First let's talk about the models they developed for language tasks.
The finetuning dataset is quite similar to Med-PaLM2, except with one major difference:
self-training with search
(2/14)
The goal is to improve clinical reasoning and ability to use search results.
Synthetic chain-of-thought w/ and w/o search results in context are generated, incorrect preds are filtered out, the model is trained on those CoT, and then the synthetic CoT is regenerated