In addition to the new model, we’re also releasing SA-V, a dataset that’s 4.5x larger + has ~53x more annotations than the largest existing video segmentation dataset.
We hope this work will help accelerate new computer vision research ➡️ go.fb.me/yck7bu
Like the original SAM, SAM 2 can be applied out of the box to a diverse range of real-world use cases and we’re excited to see what developers build with it.
More technical details on the new Llama 3.1 models we released today. 🦙🧵
Today’s release includes 8B, 70B and 405B Llama 3.1 models that were trained on a greater quality and quantity of data than Llama 3 for both pre- and post-training. All three models were trained on over 15T tokens.
Hearing feedback from the community, all of our new Llama 3.1 models deliver a 16x increase in context window going all the way up to 128K tokens. This enables new uses in long text summarization, coding and more.
We took a careful approach to the balance of data used in training these models to ensure that we maintain the quality of the model in short-context use cases even as we increased the context length.
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet.
Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation.
The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one.
More details in the full announcement ➡️
Download Llama 3.1 models ➡️
With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.go.fb.me/tpuhb6 go.fb.me/vq04tr
Training a model as large and capable as Llama 3.1 405B was no simple task. The model was trained on over 15 trillion tokens over the course of several months requiring over 16K @NVIDIA H100 GPUs — making it the first Llama model ever trained at this scale.
We also used the 405B parameter model to improve the post-training quality of our smaller models.
With Llama 3.1, we evaluated performance on >150 benchmark datasets spanning a wide range of languages — in addition to extensive human evaluations in real-world scenarios. These results show that the 405B competes with leading closed source models like GPT-4, Claude 2 and Gemini Ultra across a range of tasks.
Our upgraded Llama 3.1 8B & 70B models are also best-in-class, outperforming other models at their size while also delivering a better balance of helpfulness and safety than their predecessors. These smaller models support the same improved 128K token context window, multilinguality, improved reasoning and state-of-the-art tool use to enable more advanced use cases.
Introducing Meta Llama 3: the most capable openly available LLM to date.
Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes.
Today's release includes the first two Llama 3 models — in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance + the Llama 3 research paper for the community to learn from our work.
Llama 3 delivers a major leap over Llama 2 and demonstrates SOTA performance on a wide range of industry benchmarks.
The models also achieve substantially reduced false refusal rates, improved alignment and increased diversity in model responses — in addition to improved capabilities such as reasoning, code generation and instruction following.
Across the stack, we want to kickstart the next wave of innovation in AI. We can’t wait to see what you build and look forward to your feedback.
You can download the models directly from Meta today, additionally though, working closely with our partners, Llama 3 models will be available soon across @AWSCloud, @Azure, @Databricks, @GoogleCloud, @HuggingFace, @IBM WatsonX, @Kaggle, @NVIDIA and @SnowflakeDB.
Today we're releasing the Open Catalyst Demo to the public — this new service will allow researchers to accelerate work in material sciences by enabling them to simulate the reactivity of catalyst materials ~1000x faster than existing computational methods using AI.
Demo ⬇️
Low cost catalyst materials are an important means towards a renewable energy future. By making this demo available to the public, we hope to increase the pace of material discovery for low-cost catalyst materials and showcase the emerging generalizability of these models.
The Open Catalyst demo supports adsorption energy calculations for 11,427 catalyst materials and 86 adsorbates, which amounts to ~100M catalyst surface-adsorbate combinations — a scale impossible to explore without machine learning.
(1/3) Until now, AI translation has focused mainly on written languages. Universal Speech Translator (UST) is the 1st AI-powered speech-to-speech translation system for a primarily oral language, translating Hokkien, one of many primarily spoken languages. bit.ly/3CJP3ew
(2/3) Hokkien, one of ~3k primarily spoken languages, has no standard writing system and very few human translators, making it even more difficult to create training data for our models and difficult to rely on Hokkien transcripts.
(3/3) By open-sourcing our benchmark data to the AI community and including our Hokkien system in a UST, we hope other researchers will build on this work. Someday all languages, written or unwritten, may no longer be an obstacle to mutual understanding. bit.ly/3VGxVif