Together AI Profile picture
Dec 8, 2023 9 tweets 3 min read Read on X
Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context.

It builds on the lessons learned in past year designing efficient sequence modeling architectures.

together.ai/blog/stripedhy…
Image
This release includes StripedHyena-Hessian-7B (SH 7B), a base model, & StripedHyena-Nous-7B (SH-N 7B), a chat model. Both use a hybrid architecture based on our latest on scaling laws of efficient architectures.

Both models are available on Together API!
api.together.xyz/playground/cha…
StripedHyena is the first alternative model competitive with the best open-source Transformers in short and long-context evaluations. Achieves comparable performance with Llama-2, Yi & Mistral 7B on OpenLLM leaderboard, outperforming on long-context summarization. Image
On short-context tasks, including OpenLLM leaderboard tasks, StripedHyena outperforms Llama-2 7B, Yi 7B and the strongest Transformer alternatives such as RWKV-Raven 14B: Image
StripedHyena is faster and more memory efficient for long sequence training, fine-tuning, and generation. Using our latest research on fast kernels for gated convolutions (FlashFFTConv) and on efficient Hyena inference, StripedHyena is >30%, >50%, and >100% faster. Image
StripedHyena is designed using our latest research on scaling laws of efficient architectures. In particular, StripedHyena is a hybrid of attention and gated convolutions arranged in Hyena operators. Via a compute-optimal scaling protocol, we identify several ways to improve. Image
StripedHyena is optimized using a set of new model grafting techniques, enabling us to change the model architecture during training. We grafted architectural components of Transformers and Hyena, and trained on a mix of the RedPajama dataset, augmented with longer-context data.
One additional advantage of StripedHyena is a >50% reduced memory footprint during autoregressive generation, compared to a Transformer (both with grouped-query attention). Image
This work would not have been possible without our collaborators @HazyResearch, @NousResearch, and @Hessian_AI.

It builds on our past work with @Mila_Quebec, @huggingface. We are grateful to open source AI community leaders including @AIatMeta, @AiEleuther, @MistralAI & others.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Together AI

Together AI Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @togethercompute

Apr 8
Announcing DeepCoder-14B – an o1 & o3-mini level coding reasoning model fully open-sourced!

We’re releasing everything: dataset, code, and training recipe.🔥

Built in collaboration with the @Agentica_ team.

See how we created it. 🧵Image
Training Technique

To scale reasoning without sacrificing the model’s long-context capability, we combine:
→ Iterative context lengthening
→ Overlong filtering (from DAPO)

We train DeepCoder-14B-Preview from 16K → 32K, then evaluate at 64K.

Results on LiveCodeBench:
• 16K: 54%
• 32K: 58%
• 64K: 60.6% (despite never training at 64K)

Our model is able to generalize to 64K context despite never getting trained on it, whereas baseline (R1-Distill-14B) plateaus beyond its training window.Image
Dataset Curation

Scaling reasoning with RL requires verifiable rewards. Unlike math, coding datasets online tend to be much noisier, resulting in faulty reward signals during training.

To address this, we’ve implemented a rigorous data pipeline:
• Official solutions must pass all tests
• ≥6 test cases per problem
• Deduplication across train/test splits

This pipeline gives us 24K high-quality verified coding problems for RL training.Image
Read 7 tweets
Jun 11, 2024
Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer.
Together MoA achieves a score of 65.1% on AlpacaEval 2.0.
together.ai/blog/together-…Image
Together MoA exhibits promising performance on AlpacaEval 2.0 and MT-Bench.

Together MoA uses six open source models as proposers and Qwen1.5-110B-Chat as the final aggregators with three layers.Image
We also evaluate on FLASK which offers more fine-grained evaluation and outperforms original models on most of the dimensions. Image
Read 9 tweets
May 5, 2023
The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions!
This project demonstrates the power of the open-source AI community with many contributors ... 🧵 together.xyz/blog/redpajama… Image
Training ran on 3,072 V100 GPUs provided as part of the INCITE 2023 project on Scalable Foundation Models for Transferrable Generalist AI, awarded to MILA, LAION, and EleutherAI in fall 2022, with support from the Oak Ridge Leadership Computing Facility (OLCF) and INCITE program.
We are thankful to all the project team members helping to build the RedPajama dataset and supporting training, including @ontocord, @DS3Lab, AAI CERC, @Mila_Quebec, @StanfordHAI, @HazyResearch and @laion_ai.
Read 6 tweets
Apr 17, 2023
Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
together.xyz/blog/redpajama
More in 🧵 … Image
In the coming weeks we will release a full suite of large language models and instruction tuned versions based on this dataset.
Download the full dataset, or a smaller random sample now on Hugging Face! huggingface.co/datasets/toget…
Read 6 tweets
Mar 30, 2023
Announcing OpenChatKit v0.16! You can now run OpenChatKit on consumer GPUs with a new 7B parameter model fine-tuned on user feedback for improved quality. And it's fast!
Details in 🧵 ... twitter.com/i/web/status/1…
Updates include:
1. A new 7B parameter, 8-bit quantized model, available for use on consumer GPUs
huggingface.co/togethercomput…
2. An improved 20B parameter model with higher quality by fine-tuning on user feedback
huggingface.co/togethercomput…
Read 8 tweets
Mar 10, 2023
Introducing OpenChatKit. A powerful, open-source base to create chatbots for various applications. Details in 🧵

together.xyz/blog/openchatk…
Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.

OpenChatKit includes 4 key components:
First, an instruction-tuned large language model, fine-tuned for chat from EleutherAI’s GPT-NeoX-20B with over 43 million instructions on 100% carbon negative compute available under Apache-2.0 license on @huggingface.

huggingface.co/togethercomput…
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(