Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Together AI

Dec 8, 2023 • 9 tweets • 3 min read • Read on X

Scrolly

Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context.

It builds on the lessons learned in past year designing efficient sequence modeling architectures.

together.ai/blog/stripedhy…

This release includes StripedHyena-Hessian-7B (SH 7B), a base model, & StripedHyena-Nous-7B (SH-N 7B), a chat model. Both use a hybrid architecture based on our latest on scaling laws of efficient architectures.

Both models are available on Together API!
api.together.xyz/playground/cha…

StripedHyena is the first alternative model competitive with the best open-source Transformers in short and long-context evaluations. Achieves comparable performance with Llama-2, Yi & Mistral 7B on OpenLLM leaderboard, outperforming on long-context summarization.

On short-context tasks, including OpenLLM leaderboard tasks, StripedHyena outperforms Llama-2 7B, Yi 7B and the strongest Transformer alternatives such as RWKV-Raven 14B:

StripedHyena is faster and more memory efficient for long sequence training, fine-tuning, and generation. Using our latest research on fast kernels for gated convolutions (FlashFFTConv) and on efficient Hyena inference, StripedHyena is >30%, >50%, and >100% faster.

StripedHyena is designed using our latest research on scaling laws of efficient architectures. In particular, StripedHyena is a hybrid of attention and gated convolutions arranged in Hyena operators. Via a compute-optimal scaling protocol, we identify several ways to improve.

StripedHyena is optimized using a set of new model grafting techniques, enabling us to change the model architecture during training. We grafted architectural components of Transformers and Hyena, and trained on a mix of the RedPajama dataset, augmented with longer-context data.

One additional advantage of StripedHyena is a >50% reduced memory footprint during autoregressive generation, compared to a Transformer (both with grouped-query attention).

This work would not have been possible without our collaborators @HazyResearch, @NousResearch, and @Hessian_AI.

It builds on our past work with @Mila_Quebec, @huggingface. We are grateful to open source AI community leaders including @AIatMeta, @AiEleuther, @MistralAI & others.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @togethercompute

Together AI

@togethercompute

Apr 8

Announcing DeepCoder-14B – an o1 & o3-mini level coding reasoning model fully open-sourced!

We’re releasing everything: dataset, code, and training recipe.🔥

Built in collaboration with the @Agentica_ team.

See how we created it. 🧵

Training Technique

To scale reasoning without sacrificing the model’s long-context capability, we combine:
→ Iterative context lengthening
→ Overlong filtering (from DAPO)

We train DeepCoder-14B-Preview from 16K → 32K, then evaluate at 64K.

Results on LiveCodeBench:
• 16K: 54%
• 32K: 58%
• 64K: 60.6% (despite never training at 64K)

Our model is able to generalize to 64K context despite never getting trained on it, whereas baseline (R1-Distill-14B) plateaus beyond its training window.

Dataset Curation

Scaling reasoning with RL requires verifiable rewards. Unlike math, coding datasets online tend to be much noisier, resulting in faulty reward signals during training.

To address this, we’ve implemented a rigorous data pipeline:
• Official solutions must pass all tests
• ≥6 test cases per problem
• Deduplication across train/test splits

This pipeline gives us 24K high-quality verified coding problems for RL training.

Read 7 tweets

Together AI

@togethercompute

Jun 11, 2024

Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer.
Together MoA achieves a score of 65.1% on AlpacaEval 2.0.
together.ai/blog/together-…

Together MoA exhibits promising performance on AlpacaEval 2.0 and MT-Bench.

Together MoA uses six open source models as proposers and Qwen1.5-110B-Chat as the final aggregators with three layers.

We also evaluate on FLASK which offers more fine-grained evaluation and outperforms original models on most of the dimensions.

Read 9 tweets

Together AI

@togethercompute

May 5, 2023

The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions!
This project demonstrates the power of the open-source AI community with many contributors ... 🧵 together.xyz/blog/redpajama…

Training ran on 3,072 V100 GPUs provided as part of the INCITE 2023 project on Scalable Foundation Models for Transferrable Generalist AI, awarded to MILA, LAION, and EleutherAI in fall 2022, with support from the Oak Ridge Leadership Computing Facility (OLCF) and INCITE program.

@ontocord

We are thankful to all the project team members helping to build the RedPajama dataset and supporting training, including @ontocord, @DS3Lab, AAI CERC, @Mila_Quebec, @StanfordHAI, @HazyResearch and @laion_ai.

Read 6 tweets

Together AI

@togethercompute

Apr 17, 2023

Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
together.xyz/blog/redpajama
More in 🧵 …

In the coming weeks we will release a full suite of large language models and instruction tuned versions based on this dataset.

Download the full dataset, or a smaller random sample now on Hugging Face! huggingface.co/datasets/toget…

Read 6 tweets

Together AI

@togethercompute

Mar 30, 2023

twitter.com/i/web/status/1…

Announcing OpenChatKit v0.16! You can now run OpenChatKit on consumer GPUs with a new 7B parameter model fine-tuned on user feedback for improved quality. And it's fast!
Details in 🧵 ... twitter.com/i/web/status/1…

Updates include:
1. A new 7B parameter, 8-bit quantized model, available for use on consumer GPUs
huggingface.co/togethercomput…

2. An improved 20B parameter model with higher quality by fine-tuning on user feedback
huggingface.co/togethercomput…

Read 8 tweets

Together AI

@togethercompute

Mar 10, 2023

Introducing OpenChatKit. A powerful, open-source base to create chatbots for various applications. Details in 🧵

together.xyz/blog/openchatk…

Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.

OpenChatKit includes 4 key components:

@huggingface

First, an instruction-tuned large language model, fine-tuned for chat from EleutherAI’s GPT-NeoX-20B with over 43 million instructions on 100% carbon negative compute available under Apache-2.0 license on @huggingface.

huggingface.co/togethercomput…

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Together AI

Try unrolling a thread yourself!

More from @togethercompute

Together AI

Together AI

Together AI

Together AI

Together AI

Together AI

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!