Latest Twitter Threads by @togethercompute on Thread Reader App

Apr 8 • 7 tweets • 3 min read

Announcing DeepCoder-14B – an o1 & o3-mini level coding reasoning model fully open-sourced!

We’re releasing everything: dataset, code, and training recipe.🔥

Built in collaboration with the @Agentica_ team.

See how we created it. 🧵

Training Technique

To scale reasoning without sacrificing the model’s long-context capability, we combine:
→ Iterative context lengthening
→ Overlong filtering (from DAPO)

We train DeepCoder-14B-Preview from 16K → 32K, then evaluate at 64K.

Results on LiveCodeBench:
• 16K: 54%
• 32K: 58%
• 64K: 60.6% (despite never training at 64K)

Our model is able to generalize to 64K context despite never getting trained on it, whereas baseline (R1-Distill-14B) plateaus beyond its training window.

Jun 11, 2024 • 9 tweets • 4 min read

Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer.
Together MoA achieves a score of 65.1% on AlpacaEval 2.0.
together.ai/blog/together-…

Together MoA exhibits promising performance on AlpacaEval 2.0 and MT-Bench.

Together MoA uses six open source models as proposers and Qwen1.5-110B-Chat as the final aggregators with three layers.

Dec 8, 2023 • 9 tweets • 3 min read

Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context.

It builds on the lessons learned in past year designing efficient sequence modeling architectures.

together.ai/blog/stripedhy…

This release includes StripedHyena-Hessian-7B (SH 7B), a base model, & StripedHyena-Nous-7B (SH-N 7B), a chat model. Both use a hybrid architecture based on our latest on scaling laws of efficient architectures.

Both models are available on Together API!
api.together.xyz/playground/cha…

May 5, 2023 • 6 tweets • 4 min read

The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions!
This project demonstrates the power of the open-source AI community with many contributors ... 🧵 together.xyz/blog/redpajama…

Training ran on 3,072 V100 GPUs provided as part of the INCITE 2023 project on Scalable Foundation Models for Transferrable Generalist AI, awarded to MILA, LAION, and EleutherAI in fall 2022, with support from the Oak Ridge Leadership Computing Facility (OLCF) and INCITE program.

Apr 17, 2023 • 6 tweets • 3 min read

Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
together.xyz/blog/redpajama
More in 🧵 …

In the coming weeks we will release a full suite of large language models and instruction tuned versions based on this dataset.

Mar 30, 2023 • 8 tweets • 4 min read

Announcing OpenChatKit v0.16! You can now run OpenChatKit on consumer GPUs with a new 7B parameter model fine-tuned on user feedback for improved quality. And it's fast!
Details in 🧵 ... twitter.com/i/web/status/1…

Updates include:
1. A new 7B parameter, 8-bit quantized model, available for use on consumer GPUs
huggingface.co/togethercomput…

Mar 10, 2023 • 12 tweets • 5 min read

Introducing OpenChatKit. A powerful, open-source base to create chatbots for various applications. Details in 🧵

together.xyz/blog/openchatk… Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.

OpenChatKit includes 4 key components:

Nov 29, 2022 • 5 tweets • 3 min read

Introducing GPT-JT, a 6B parameter open source model that can outperform many 100B+ parameter models and was trained over slow (1Gbps) internet links.

together.xyz/research/relea… Thanks to the open source AI community & publications that made this possible:
• @EleutheraI’s open models and datasets are the basis of GPT-JT and a cornerstone of today’s foundation model research.
• @googIeresearch who published UL2 and Chain-of-Thought (CoT) techniques.

Share this page!

Enter URL or ID to Unroll