Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Enrico Shippole

@EnricoShippole

Aug 31 • 18 tweets • 5 min read Twitter logo

Read on Twitter

Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.

The model can be found on @huggingface here: huggingface.co/conceptofmind/…

We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling.

A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here: huggingface.co/conceptofmind/…

The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k.

We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here: huggingface.co/conceptofmind/…

As well as the Yarn-Llama-2-7b-64k model here: huggingface.co/conceptofmind/…

We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla’s implementation of YaRN rotary embeddings can be found here: github.com/jquesnelle/yarn

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques: arxiv.org/abs/2108.12409

It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings: arxiv.org/pdf/2212.10554…

https://twitter.com/EnricoShippole/status/1655599301454594049?s=20

We previously trained the first publicly available model with rotary embedding scaling here:

https://twitter.com/EnricoShippole/status/1655599301454594049?s=20

You can find out more about the @NousResearch organization here: huggingface.co/NousResearch

The compute for these model releases is all thanks to the generous sponsorship by @carperai, @EMostaque, and @StabilityAI. This is not an official @StabilityAI product. Thank you to @dmayhem93 and @jonbtow as well for helping.

A big thank you to @Void13950782 and @AiEleuther for facilitating the discussions about context-length extrapolation and helping to write the paper. Truly an awesome open-source team and community.

If you have any questions about the methodology and models be sure to reach out to @theemozilla and me! We will try to respond promptly.

All of the models can be found on Huggingface: huggingface.co/conceptofmind

The models used @tri_dao's flash attention 2 and part of @togethercompute's codebase. You can find out more about Flash Attention 2 here: github.com/Dao-AILab/flas…

Thank you to @AlpinDale and @pygmalion_ai for providing resources to help run evaluations on these models.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @EnricoShippole

Enrico Shippole

@EnricoShippole

Jul 24

Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

The model can be found on @huggingface here: huggingface.co/conceptofmind/…

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

Read 22 tweets

Enrico Shippole

@EnricoShippole

Jul 20

Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. huggingface.co/conceptofmind/…

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.

Read 20 tweets

Enrico Shippole

@EnricoShippole

May 25

Introducing an open-source reproduction of the FLAN V2 dataset. huggingface.co/datasets/conce…

@ShayneRedford

I worked with @ShayneRedford the main author of the FLAN collection to recreate his great work and publicly release high-quality instruction tuning data. We fixed encoding issues and also increased the sequence length to 4096.

https://twitter.com/ShayneRedford/status/1661734033720762368?s=20

@carperai

Our work on an open reproduction of FLAN V2 and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @zhansheng and @fabmilo for helping build the dataset as well.

Read 14 tweets

Enrico Shippole

@EnricoShippole

May 8

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here: github.com/conceptofmind/…

The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work: github.com/lucidrains/PaL…

@carperai

Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Enrico Shippole

Try unrolling a thread yourself!

More from @EnricoShippole

Enrico Shippole

Enrico Shippole

Enrico Shippole

Enrico Shippole

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!