read through @vercel's state of ai survey to see that 22% of builders are now using @groqinc and we're one of the top providers developers switched to in the last 6 months.
1/4 and just why are developers switching? the survey shows that 23% of you cite latency/performance as a top technical challenge, which is exactly what we're solving with our lpu inference engine and offering you access to that powerful hardware via groq api.
2/4 another interesting, but not surprising data point: 86% of teams don't train their models, preferring to focus on implementation and optimization.
smart strategy - tell us the models you'd like to see and let us handle the inference speed while you spend your time building.
3/4 cost management remains a top challenge for 23% of developers. i hear you. (side eyeing openai/anthropic rn... with love). 🤨
if this is also a challenge for you, check out our pricing page to see how you can scale with us without breaking your bank: groq.com/pricing
5/5 if you want to be part of the teams that switched ( you should, although i may be biased), check out our docs - built by devs for devs.
our api is openai-compatible & we have features ranging from crazy fast reasoning to TTS.
HUGE PSA: @Meta's Llama 4 Scout (17Bx16E MoE) is now live on @GroqInc for all users via console playground and Groq API.
This conversational beast with native multimodality just dropped today and we're excited to offer Day 0 support so you can build fast.
1/7 What makes Llama 4 Scout special? It's a chonky multimodal model with a 10M context window (yes, TEN MILLION tokens).
Built on an expert mixture architecture (17B activated params, 109B total), it brings incredible image understanding and conversational abilities.
2/7 My vibe check is off the charts - Scout feels remarkably natural and chill. Think almost Claude-level chat quality with multi-image input capability that's insanely accurate and support for 12 languages.
And, of course, it's fast on Groq. Tokens go BRRR. 🏁
Just wrapped up Day 1 of @aiDotEngineer and the talks went from "agents don't work (yet)" to enterprise deployment success stories.
2024 was for experimenting with AI, but 2025 is clearly the year of putting AI agents into production.
🧵 Here are some of my key takeaways:
1/9 @graceisford shared how agents = complex systems with compounding errors, but there's hope if we focus on:
- Data being our best asset/differentiator
- Personal LLM evals
- Tools to mitigate errors
- Intuitive AI UX (the moat that matters)
- Reimagining DevEx (go multimodal)
2/9 @HamelHusain & @gregce10 shared how to build an AI strategy that fails. Amongst all the S+ tier memes, what stood out to me is to drop AI jargon.
It's important to step out of our tech bubble to see that AI adoption goes beyond our domain. Keep it simple to drive adoption.
PSA: @Alibaba_Qwen's Qwen-2.5-Coder-32B-Instruct is now live on @GroqInc for insanely fast (and smart) code generation.
See below for instructions to add to @cursor_ai.
1/4 Qwen2.5 Coder is state-of-the-art when it comes to coding capabilities for open-source models with impressive performance across several popular code generation benchmarks - even beating GPT-4o and Claude 3.5 Sonnet.
2/4 Beyond code generation, Qwen2.5 Coder with Groq speed is a game-changer for debugging workflows. Image Jon Skeet (famous for being top contributor on @StackOverflow) reviewing your code in real-time and helping you build, fix bugs, and ship fast. This is the dream (but real).
1/5 What is speculative decoding? It's a technique that uses a smaller, faster model to predict a sequence of tokens, which are then verified by the main, more powerful model in parallel. The main model evaluates these predictions and determines which tokens to keep or reject.
2/5 Speculative decoding achieves faster inference because the main model can verify multiple tokens in parallel rather than generating them one-by-one. This parallel verification is significantly faster than traditional sequential token generation.
Let's couple our vibe coding with vibe learning with this incredible dive into LLMs that @karpathy just dropped. 🧠
This is what democratizing AI education looks like with knowledge for both beginners and builders. And if you're new to AI development, this thread is for you.
2/7 Karpathy explains how parallelization is possible during LLM training, but output token generation is sequential during LLM inference. Specialized HW (like Groq's LPU) is designed to optimize such computational reqs, particularly sequential token gen, for fast LLM outputs.
3/7 And while training LLMs requires massive GPU clusters ($$$), using LLMs for inference doesn't. 🤝
You can get access to insanely-fast inference for top models via Groq API and start building right now. Seriously. Here are some apps others have built: console.groq.com/docs/showcase-…
- Flex Tier Beta is live for Llama 3.3 70b/8b with 10x higher rate limits
- Whisper Large v3 is now 67% faster (tokens go BRRR)
- Whisper Large v3 audio file limit is now 100MB (up from 40MB)
- The DevRel team is growing 📈📈📈
2/4 Flex Tier gives on-demand processing with rapid timeout when resources are constrained - perfect for workloads that need fast inference and can handle occasional request failures.
Available with Llama 3.3 70b/8b for paid tier at the same price.