Guillaume Lample @ ICLR 2024 Profile picture
Feb 24, 2023 8 tweets 7 min read Read on X
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at research.facebook.com/publications/l…
1/n ImageImageImageImage
Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented.
2/n Image
All our models were trained on at least 1T tokens, much more than what is typically used at this scale.
Interestingly, even after 1T tokens the 7B model was still improving.
3/n ImageImage
On Common Sense Reasoning, Closed-book Question Answering, and Reading Comprehension, LLaMA-65B outperforms Chinchilla 70B and PaLM 540B on almost all benchmarks.
4/n ImageImageImageImage
LLaMA-65B outperforms Minerva-62B on GSM8k, even though it has not been fine-tuned on any mathematical dataset. On the MATH benchmark, it outperforms PaLM-62B (but is quite below Minerva-62B)
5/n Image
On code generation benchmarks, LLaMA-62B outperforms cont-PaLM (62B) as well as PaLM-540B. Image
We also briefly tried instruction finetuning using the approach of Chung et al. (2022).
The resulting model, LLaMA-I, outperforms Flan-PaLM-cont (62B) on MMLU and showcases some interesting instruct capabilities.
7/n ImageImageImageImage

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Guillaume Lample @ ICLR 2024

Guillaume Lample @ ICLR 2024 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GuillaumeLample

Jul 24, 2024
Today, we release Mistral Large 2, the new version of our largest model. Mistral Large 2 is a 123B-parameter model with a 128k context window. On many benchmarks (notably in code generation and math), it is superior or on par with Llama 3.1 405B. Like Mistral NeMo, it was trained on a very large amount of source code and multilingual data. (1/N)
On HumanEval and on MultiPL-E, Mistral Large 2 outperforms Llama 3.1 405B instruct, and scores just below GPT-4o. On MATH (0-shot, without CoT) it only falls behind GPT-4o.
(2/N)

Image
Image
Image
On Multilingual MMLU, the performance of Mistral Large 2 significantly outperforms Llama 3.1 70B base (+6.3% average over 9 languages) and is on par with Llama 3 405B (-0.4% below). (3/N) Image
Read 5 tweets
Oct 25, 2022
Super excited about this work! We showed that you can use large language models to align informal mathematical proofs (e.g. written in Latex) to formal proof sketches (e.g. skeletons of proofs written in a formal system like Lean or Isabelle). Image
A formal sketch provides a high-level description of the proof that follows the same reasoning steps as the informal proof. The sketches are in turn converted to a complete proof by an automated prover (we used SledgeHammer but we could use more powerful/neural based provers too)
We show that augmenting an automated prover with such formal sketches improves the performance of the prover immensely. Another nice thing about the generated formal proofs is that they are extremely easy to read as they by design come with lots of comments and intermediate steps Image
Read 8 tweets
Jun 8, 2020
Unsupervised Translation of Programming Languages. Feed a model with Python, C++, and Java source code from GitHub, and it automatically learns to translate between the 3 languages in a fully unsupervised way. arxiv.org/pdf/2006.03511…
with @MaLachaux @b_roziere @LowikChanussot
We leverage the same principles that we used to translate low-resource languages (arxiv.org/abs/1804.07755), i.e. pretraining, denoising auto-encoding, and back-translation. Although initially designed for natural languages, these methods perfectly apply to programming languages.
The model learns to align functions and objects across libraries (std::unordered_set -> HashSet, printf -> System.out.println, std::vector<int> -> List<Integer>, Files.createDirectories -> os.makedirs), but also language specific patterns (a > b ? a : b -> a if a > b else b)
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(