Enrico Shippole Profile picture
Jul 24 22 tweets 5 min read Twitter logo Read on Twitter
Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. Image
The model can be found on @huggingface here: huggingface.co/conceptofmind/…
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
A Llama-2 7b model trained at 16k context length will release soon on @huggingface here: huggingface.co/conceptofmind/…
The model has identical performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.
Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor. Image
The repository containing @theemozilla’s implementation of scaled rotary embeddings can be found here: github.com/jquesnelle/sca…
If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1's blog posts on his findings: kaiokendev.github.io
A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged: github.com/huggingface/tr…
The model was further trained for ~1 billion tokens on @togethercompute's Red Pajama dataset. The context length of the examples varies: huggingface.co/datasets/toget…
The pre-tokenized dataset will be available here for you to use soon: huggingface.co/datasets/conce…
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques: arxiv.org/abs/2108.12409
It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings: arxiv.org/pdf/2212.10554…
We previously trained the first publicly available model with rotary embedding scaling here:
You can find out more about the @NousResearch organization here: huggingface.co/NousResearch
The compute for this model release is all thanks to the generous sponsorship by @carperai, @EMostaque, and @StabilityAI. This is not an official @StabilityAI product.
A big thank you to @AiEleuther for facilitating the discussions about context-length extrapolation as well. Truly an awesome open-source team and community.
If you have any questions about the data or model be sure to reach out and ask! I will try to respond promptly.
The previous suite of LLongMA model releases can be found here:
All of the models can be found on Huggingface: huggingface.co/conceptofmind
The previous LLongMA-2 7b model can be found here:
Testimonials about LLongMA-2 7b can be seen on @huggingface here: huggingface.co/conceptofmind/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Enrico Shippole

Enrico Shippole Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @EnricoShippole

Jul 20
Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. huggingface.co/conceptofmind/…
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies. Image
The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.
Read 20 tweets
May 25
Introducing an open-source reproduction of the FLAN V2 dataset. huggingface.co/datasets/conce…
I worked with @ShayneRedford the main author of the FLAN collection to recreate his great work and publicly release high-quality instruction tuning data. We fixed encoding issues and also increased the sequence length to 4096.
Our work on an open reproduction of FLAN V2 and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @zhansheng and @fabmilo for helping build the dataset as well.
Read 14 tweets
May 8
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here: github.com/conceptofmind/…
The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work: github.com/lucidrains/PaL…
Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(