Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Guillaume Lample @ ICLR 2024

Feb 24, 2023 • 8 tweets • 7 min read • Read on X

Scrolly

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at research.facebook.com/publications/l…
1/n

Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented.
2/n

All our models were trained on at least 1T tokens, much more than what is typically used at this scale.
Interestingly, even after 1T tokens the 7B model was still improving.
3/n

On Common Sense Reasoning, Closed-book Question Answering, and Reading Comprehension, LLaMA-65B outperforms Chinchilla 70B and PaLM 540B on almost all benchmarks.
4/n

LLaMA-65B outperforms Minerva-62B on GSM8k, even though it has not been fine-tuned on any mathematical dataset. On the MATH benchmark, it outperforms PaLM-62B (but is quite below Minerva-62B)
5/n

On code generation benchmarks, LLaMA-62B outperforms cont-PaLM (62B) as well as PaLM-540B.

We also briefly tried instruction finetuning using the approach of Chung et al. (2022).
The resulting model, LLaMA-I, outperforms Flan-PaLM-cont (62B) on MMLU and showcases some interesting instruct capabilities.
7/n

@HugoTouvron

With @HugoTouvron, @LavrilThibaut, @gizacard, @javier_m, @MaLachaux, @tlacroix6, @b_roziere, @NamanGoyal21, Eric Hambro, Faisal Azhar, @AurR0d, @armandjoulin, @EXGRV
8/8

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @GuillaumeLample

Guillaume Lample @ ICLR 2024

@GuillaumeLample

Jul 24, 2024

Today, we release Mistral Large 2, the new version of our largest model. Mistral Large 2 is a 123B-parameter model with a 128k context window. On many benchmarks (notably in code generation and math), it is superior or on par with Llama 3.1 405B. Like Mistral NeMo, it was trained on a very large amount of source code and multilingual data. (1/N)

On HumanEval and on MultiPL-E, Mistral Large 2 outperforms Llama 3.1 405B instruct, and scores just below GPT-4o. On MATH (0-shot, without CoT) it only falls behind GPT-4o.
(2/N)

On Multilingual MMLU, the performance of Mistral Large 2 significantly outperforms Llama 3.1 70B base (+6.3% average over 9 languages) and is on par with Llama 3 405B (-0.4% below). (3/N)

Read 5 tweets

Guillaume Lample @ ICLR 2024

@GuillaumeLample

Oct 25, 2022

https://twitter.com/AlbertQJiang/status/1584877475502301184

Super excited about this work! We showed that you can use large language models to align informal mathematical proofs (e.g. written in Latex) to formal proof sketches (e.g. skeletons of proofs written in a formal system like Lean or Isabelle).

https://twitter.com/AlbertQJiang/status/1584877475502301184

A formal sketch provides a high-level description of the proof that follows the same reasoning steps as the informal proof. The sketches are in turn converted to a complete proof by an automated prover (we used SledgeHammer but we could use more powerful/neural based provers too)

We show that augmenting an automated prover with such formal sketches improves the performance of the prover immensely. Another nice thing about the generated formal proofs is that they are extremely easy to read as they by design come with lots of comments and intermediate steps

Read 8 tweets

Guillaume Lample @ ICLR 2024

@GuillaumeLample

Jun 8, 2020

@MaLachaux

Unsupervised Translation of Programming Languages. Feed a model with Python, C++, and Java source code from GitHub, and it automatically learns to translate between the 3 languages in a fully unsupervised way. arxiv.org/pdf/2006.03511…
with @MaLachaux @b_roziere @LowikChanussot

We leverage the same principles that we used to translate low-resource languages (arxiv.org/abs/1804.07755), i.e. pretraining, denoising auto-encoding, and back-translation. Although initially designed for natural languages, these methods perfectly apply to programming languages.

The model learns to align functions and objects across libraries (std::unordered_set -> HashSet, printf -> System.out.println, std::vector<int> -> List<Integer>, Files.createDirectories -> os.makedirs), but also language specific patterns (a > b ? a : b -> a if a > b else b)

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Guillaume Lample @ ICLR 2024

Try unrolling a thread yourself!

More from @GuillaumeLample

Guillaume Lample @ ICLR 2024

Guillaume Lample @ ICLR 2024

Guillaume Lample @ ICLR 2024

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!