Enrico Shippole Profile picture
Aug 31 18 tweets 5 min read Twitter logo Read on Twitter
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther. Image
The model can be found on @huggingface here: huggingface.co/conceptofmind/…
We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling. Image
A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here: huggingface.co/conceptofmind/…
The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k. Image
We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here: huggingface.co/conceptofmind/…
As well as the Yarn-Llama-2-7b-64k model here: huggingface.co/conceptofmind/…
We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla’s implementation of YaRN rotary embeddings can be found here: github.com/jquesnelle/yarn
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques: arxiv.org/abs/2108.12409
It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings: arxiv.org/pdf/2212.10554…
We previously trained the first publicly available model with rotary embedding scaling here:
You can find out more about the @NousResearch organization here: huggingface.co/NousResearch
The compute for these model releases is all thanks to the generous sponsorship by @carperai, @EMostaque, and @StabilityAI. This is not an official @StabilityAI product. Thank you to @dmayhem93 and @jonbtow as well for helping.
A big thank you to @Void13950782 and @AiEleuther for facilitating the discussions about context-length extrapolation and helping to write the paper. Truly an awesome open-source team and community.
If you have any questions about the methodology and models be sure to reach out to @theemozilla and me! We will try to respond promptly.
All of the models can be found on Huggingface: huggingface.co/conceptofmind
The models used @tri_dao's flash attention 2 and part of @togethercompute's codebase. You can find out more about Flash Attention 2 here: github.com/Dao-AILab/flas…
Thank you to @AlpinDale and @pygmalion_ai for providing resources to help run evaluations on these models.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Enrico Shippole

Enrico Shippole Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @EnricoShippole

Jul 24
Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. Image
The model can be found on @huggingface here: huggingface.co/conceptofmind/…
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
Read 22 tweets
Jul 20
Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. huggingface.co/conceptofmind/…
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies. Image
The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.
Read 20 tweets
May 25
Introducing an open-source reproduction of the FLAN V2 dataset. huggingface.co/datasets/conce…
I worked with @ShayneRedford the main author of the FLAN collection to recreate his great work and publicly release high-quality instruction tuning data. We fixed encoding issues and also increased the sequence length to 4096.
Our work on an open reproduction of FLAN V2 and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @zhansheng and @fabmilo for helping build the dataset as well.
Read 14 tweets
May 8
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here: github.com/conceptofmind/…
The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work: github.com/lucidrains/PaL…
Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(