Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Enrico Shippole

@EnricoShippole

Jul 24 • 22 tweets • 5 min read Twitter logo

Read on Twitter

Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

The model can be found on @huggingface here: huggingface.co/conceptofmind/…

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

A Llama-2 7b model trained at 16k context length will release soon on @huggingface here: huggingface.co/conceptofmind/…

The model has identical performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.

Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor.

The repository containing @theemozilla’s implementation of scaled rotary embeddings can be found here: github.com/jquesnelle/sca…

If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1's blog posts on his findings: kaiokendev.github.io

A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged: github.com/huggingface/tr…

The model was further trained for ~1 billion tokens on @togethercompute's Red Pajama dataset. The context length of the examples varies: huggingface.co/datasets/toget…

The pre-tokenized dataset will be available here for you to use soon: huggingface.co/datasets/conce…

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques: arxiv.org/abs/2108.12409

It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings: arxiv.org/pdf/2212.10554…

https://twitter.com/EnricoShippole/status/1655599301454594049?s=20

We previously trained the first publicly available model with rotary embedding scaling here:

https://twitter.com/EnricoShippole/status/1655599301454594049?s=20

You can find out more about the @NousResearch organization here: huggingface.co/NousResearch

The compute for this model release is all thanks to the generous sponsorship by @carperai, @EMostaque, and @StabilityAI. This is not an official @StabilityAI product.

A big thank you to @AiEleuther for facilitating the discussions about context-length extrapolation as well. Truly an awesome open-source team and community.

If you have any questions about the data or model be sure to reach out and ask! I will try to respond promptly.

https://twitter.com/EnricoShippole/status/1677346578720256000?s=20

The previous suite of LLongMA model releases can be found here:

https://twitter.com/EnricoShippole/status/1677346578720256000?s=20

All of the models can be found on Huggingface: huggingface.co/conceptofmind

https://twitter.com/EnricoShippole/status/1682054848584228866?s=20

The previous LLongMA-2 7b model can be found here:

https://twitter.com/EnricoShippole/status/1682054848584228866?s=20

Testimonials about LLongMA-2 7b can be seen on @huggingface here: huggingface.co/conceptofmind/…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @EnricoShippole

Enrico Shippole

@EnricoShippole

Jul 20

Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1. huggingface.co/conceptofmind/…

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.

Read 20 tweets

Enrico Shippole

@EnricoShippole

May 25

Introducing an open-source reproduction of the FLAN V2 dataset. huggingface.co/datasets/conce…

@ShayneRedford

I worked with @ShayneRedford the main author of the FLAN collection to recreate his great work and publicly release high-quality instruction tuning data. We fixed encoding issues and also increased the sequence length to 4096.

https://twitter.com/ShayneRedford/status/1661734033720762368?s=20

@carperai

Our work on an open reproduction of FLAN V2 and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @zhansheng and @fabmilo for helping build the dataset as well.

Read 14 tweets

Enrico Shippole

@EnricoShippole

May 8

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here: github.com/conceptofmind/…

The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work: github.com/lucidrains/PaL…

@carperai

Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI.

A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Enrico Shippole

Try unrolling a thread yourself!

More from @EnricoShippole

Enrico Shippole

Enrico Shippole

Enrico Shippole

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!