Cerebras Profile picture
May 19 2 tweets 1 min read Read on X
Cerebras is now running Kimi K2.6 – a trillion parameter model – in enterprise trials.

At ~1,000 tokens/s, this is the fastest frontier model performance ever measured by Artificial Analysis @ArtificialAnlys. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Cerebras

Cerebras Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @cerebras

Oct 24, 2024
🚨 Cerebras Inference is now 3x faster:
Llama3.1-70B just broke 2,100 tokens/s
- 16x faster than the fastest GPU solution
- 8x faster than GPUs running Llama *3B*
- It's like the perf of a new hardware generation in a single software release
Available now at inference.cerebras.ai
We broke all records when we launched Cerebras Inference in August. Today we are tripling our performance from 650 t/s to 2100 t/s.
Cerebras Inference speed is in a league of its own – 16x faster than the fastest GPU solution, 68x faster than hyperscale clouds, and 4-8x faster than other AI accelerators.Image
Time to first token is critical for real time applications. Cerebras is among the fastest in first token latency, showing the advantage of wafer scale integration vs. complex networked solutions. Image
Read 7 tweets
Aug 27, 2024
Introducing Cerebras Inference
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for devs
Try now: inference.cerebras.ai
Cerebras Inference is the fastest Llama3.1 inference API by far: 1,800 tokens/s for 8B and 450tokens/s for 70B. We are ~20x faster than NVIDA GPUs and ~2x faster than Groq. Image
Going from 90 tokens/s to 1,800 tokens/s is like going from dialup to broadband. It makes AI instant:
Read 10 tweets
Jul 24, 2023
Introducing BTLM-3B-8K: an open, state-of-the art 3B parameter model with 7B level performance. When quantized, it fits in as little as 3GB of memory 🤯. It runs on iPhone, Google Pixel, even Raspberry Pi. BTLM goes live on Bittensor later this week! 🧵👇
https://t.co/7aKLkeUeUIbuff.ly/3Q5dtY5
Image
Today's popular models can run on a powerful PC but don't fit in popular mobile devices. In May @Opentensor challenged us to build a SoTA model that runs on any device and supports long context. Thus was born BTLM - a 3B model with 7B performance and 8K context length! Image
@opentensor BTLM sets a new standard in 3B performance. Thanks to its high quality training data (SlimPajama-627B), it outperforms 3B models trained on almost 2x the data.

It’s also the first model trained on the Condor Galaxy 1 AI supercomputer thanks to the support of G42 Cloud & IIAI!
Image
Image
Read 10 tweets
Jul 20, 2023
📣 Today we are announcing Condor Galaxy-1: a 4 exaflop AI supercomputer built in partnership with @G42ai. Powered by 64 Cerebras CS-2 systems, 54M cores, and 82TB of memory – it's the largest AI supercomputer we've ever built. But that's not all: CG-1 is just the start.. Image
G42 Cloud is the largest public cloud provider of the UAE. To expand its AI offering, we are planning not one but *nine* AI supercomputers. When complete in 2024, the full Condor Galaxy system will have 9 instances, 576 CS-2s, for a total of 36 exaFLOPs of AI compute. 🤯🤯 Image
How does 36 exaFLOPs compare to other AI supercomputers? It's 4x the performance of Google's latest TPU Pod v4 and 9x the performance of Nvidia's yet complete Israel-1. Image
Read 9 tweets
Jun 9, 2023
📣 New dataset drop!
Introducing SlimPajama-627B: the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models. 🧵cerebras.net/blog/slimpajam… Image
RedPajama-1T is the largest open dataset today but contains a large percentage of duplicates, making a full training run costly and inefficient. Like the Falcon team, we found data quality is just as important as quantity – which led to SlimPajama. huggingface.co/datasets/cereb…
SlimPajama cleans and deduplicates RedPajama-1T, reducing the total token count and file size by 50%. It's half the size and trains twice as fast! It’s the highest quality dataset when training to 600B tokens and when upsampled performs equal or better than RedPajama. Image
Read 6 tweets
Mar 28, 2023
🎉 Exciting news! Today we are releasing Cerebras-GPT, a family of 7 GPT models from 111M to 13B parameters trained using the Chinchilla formula. These are the highest accuracy models for a compute budget and are available today open-source! (1/5)

Press: businesswire.com/news/home/2023…
The AI industry is becoming increasingly closed. We believe in fostering open access to the most advanced models. Cerebras-GPT is being released under the Apache 2.0 license, allowing royalty-free use for research or commercial applications. (2/5) Image
One notable output of Cerebras-GPT is a new scaling law that predicts model performance for a given compute budget. This is the first scaling law derived using a public dataset. (3/5) Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(