Post

https://twitter.com/vercel/status/1912588341973065972

More from @ozenhati

Hatice Ozen

@ozenhati

Apr 5

HUGE PSA: @Meta's Llama 4 Scout (17Bx16E MoE) is now live on @GroqInc for all users via console playground and Groq API.

This conversational beast with native multimodality just dropped today and we're excited to offer Day 0 support so you can build fast.

1/7 What makes Llama 4 Scout special? It's a chonky multimodal model with a 10M context window (yes, TEN MILLION tokens).

Built on an expert mixture architecture (17B activated params, 109B total), it brings incredible image understanding and conversational abilities.

2/7 My vibe check is off the charts - Scout feels remarkably natural and chill. Think almost Claude-level chat quality with multi-image input capability that's insanely accurate and support for 12 languages.

And, of course, it's fast on Groq. Tokens go BRRR. 🏁

Read 8 tweets

Hatice Ozen

@ozenhati

Feb 21

Just wrapped up Day 1 of @aiDotEngineer and the talks went from "agents don't work (yet)" to enterprise deployment success stories.

2024 was for experimenting with AI, but 2025 is clearly the year of putting AI agents into production.

🧵 Here are some of my key takeaways:

1/9 @graceisford shared how agents = complex systems with compounding errors, but there's hope if we focus on:
- Data being our best asset/differentiator
- Personal LLM evals
- Tools to mitigate errors
- Intuitive AI UX (the moat that matters)
- Reimagining DevEx (go multimodal)

2/9 @HamelHusain & @gregce10 shared how to build an AI strategy that fails. Amongst all the S+ tier memes, what stood out to me is to drop AI jargon.

It's important to step out of our tech bubble to see that AI adoption goes beyond our domain. Keep it simple to drive adoption.

Read 10 tweets

Hatice Ozen

@ozenhati

Feb 14

PSA: @Alibaba_Qwen's Qwen-2.5-Coder-32B-Instruct is now live on @GroqInc for insanely fast (and smart) code generation.

See below for instructions to add to @cursor_ai.

1/4 Qwen2.5 Coder is state-of-the-art when it comes to coding capabilities for open-source models with impressive performance across several popular code generation benchmarks - even beating GPT-4o and Claude 3.5 Sonnet.

2/4 Beyond code generation, Qwen2.5 Coder with Groq speed is a game-changer for debugging workflows. Image Jon Skeet (famous for being top contributor on @StackOverflow) reviewing your code in real-time and helping you build, fix bugs, and ship fast. This is the dream (but real).

Read 5 tweets

Hatice Ozen

@ozenhati

Feb 6

https://twitter.com/ArtificialAnlys/status/1887300358684479523

PSA: DeepSeek R1 Distill Llama 70B speculative decoding version is now live on @GroqInc for Dev Tier.

We just made fast even faster for instant reasoning. 🏁

https://twitter.com/ArtificialAnlys/status/1887300358684479523

1/5 What is speculative decoding? It's a technique that uses a smaller, faster model to predict a sequence of tokens, which are then verified by the main, more powerful model in parallel. The main model evaluates these predictions and determines which tokens to keep or reject.

2/5 Speculative decoding achieves faster inference because the main model can verify multiple tokens in parallel rather than generating them one-by-one. This parallel verification is significantly faster than traditional sequential token generation.

Read 6 tweets

Hatice Ozen

@ozenhati

Feb 6

https://twitter.com/karpathy/status/1887211193099825254

Let's couple our vibe coding with vibe learning with this incredible dive into LLMs that @karpathy just dropped. 🧠

This is what democratizing AI education looks like with knowledge for both beginners and builders. And if you're new to AI development, this thread is for you.

https://twitter.com/karpathy/status/1887211193099825254

2/7 Karpathy explains how parallelization is possible during LLM training, but output token generation is sequential during LLM inference. Specialized HW (like Groq's LPU) is designed to optimize such computational reqs, particularly sequential token gen, for fast LLM outputs.

3/7 And while training LLMs requires massive GPU clusters ($$$), using LLMs for inference doesn't. 🤝

You can get access to insanely-fast inference for top models via Groq API and start building right now. Seriously. Here are some apps others have built: console.groq.com/docs/showcase-…

Read 7 tweets

Hatice Ozen

@ozenhati

Jan 21

Huge ship recap from @GroqInc this past week:

- Flex Tier Beta is live for Llama 3.3 70b/8b with 10x higher rate limits
- Whisper Large v3 is now 67% faster (tokens go BRRR)
- Whisper Large v3 audio file limit is now 100MB (up from 40MB)
- The DevRel team is growing 📈📈📈

2/4 Flex Tier gives on-demand processing with rapid timeout when resources are constrained - perfect for workloads that need fast inference and can handle occasional request failures.

Available with Llama 3.3 70b/8b for paid tier at the same price.

See: console.groq.com/docs/service-t…

3/4 We've also significantly improved Whisper Large v3 performance to make it 67% faster and increased file size limit to 100MB from 40MB.

You asked, we listened (srsly, keep asking me for more features)! TTS up next.

To leverage the 100MB limit, provide your file via URL:

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Hatice Ozen

Try unrolling a thread yourself!

More from @ozenhati

Hatice Ozen

Hatice Ozen

Hatice Ozen

Hatice Ozen

Hatice Ozen

Hatice Ozen

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!