Post

More from @ihteshamit

Ihtesham Haider

@ihteshamit

Sep 15, 2025

This paper just exposed RAG's biggest lie 😳

99% of people think RAG is just "search some docs, stuff them into a prompt." That's Naive RAG. It worked for demos. It doesn't work for production.

The real evolution happened when researchers realized LLMs don't just need more information. They need the right information, at the right time, in the right format.

This led to Advanced RAG with query rewriting and context compression. Better, but still linear.

Now we're in the Modular RAG era. Instead of retrieve-then-generate, we have systems that decide when to retrieve, what to retrieve, and how many times. Self-RAG lets models critique their own outputs and retrieve more context when confidence drops.

But here's what nobody talks about: RAG and fine-tuning aren't competitors. They're complementary. Fine-tuning gives you style. RAG gives you fresh facts.

Most interesting finding: noise sometimes helps. One study found that including irrelevant documents can increase accuracy by 30%. The model learns to filter signal from noise.

The evaluation problem is real though. We're measuring RAG systems with metrics designed for traditional QA. Context relevance and answer faithfulness barely scratch the surface.

Production RAG faces different challenges. Data security, retrieval efficiency, preventing models from leaking document metadata. The engineering problems matter more than research papers.

Multi-modal RAG is coming fast. Text plus images plus code plus audio. The principles transfer, but complexity explodes.

My take: we're still early. Current RAG feels like early search engines. The next breakthrough comes from better integration with long-context models, not replacing them.

One prediction: the distinction between retrieval and generation blurs completely. Future models won't retrieve documents, they'll retrieve and synthesize information in a single forward pass.

1. The three paradigms of RAG evolution: Naive (basic retrieve-read), Advanced (pre/post processing), and Modular (adaptive retrieval).

We're moving from "always retrieve" to "retrieve when needed."

2. RAG retrieval granularity matters more than you think. From tokens to documents, each level has tradeoffs.

Propositions (atomic factual segments) might be the sweet spot for precision without losing context.

Read 8 tweets

Ihtesham Haider

@ihteshamit

Sep 14, 2025

I just read this Google research paper that completely broke my brain 😳

So these researchers took regular language models - the same ones everyone says "can't really think" - and tried something dead simple. Instead of asking for quick answers, they just said "hey, show me how you work through this step by step."

That's it. No fancy training. No special algorithms. Just better prompts.

The results? Absolutely insane.

Math problems that stumped these models? Suddenly they're solving them left and right. We're talking 18% accuracy shooting up to 57% on the same exact model. Same brain, different conversation.

But here's where it gets weird. This only worked on the really big models. The smaller ones? They actually got worse. Started rambling nonsense that sounded smart but made zero sense.

Something magical happens around 100 billion parameters though. The model just... starts thinking. Like, actual logical reasoning chains that you can follow. Nobody taught it this. It just emerged.

I've been using ChatGPT and Claude completely wrong this whole time. Instead of wanting instant answers, I should've been asking "walk me through this."

They tested this on everything. Math, common sense questions, logic puzzles. Same pattern everywhere. The models were always capable of this stuff - we just never knew how to ask.

Makes me wonder what else these systems can do that we haven't figured out yet. Like, if reasoning just pops up when you scale things up and ask differently, what happens when someone figures out the right way to prompt for creativity? Or planning? Or solving actually hard problems?

The craziest part is that the models don't even need to be retrained. They already have this ability sitting there, waiting for someone to unlock it with the right conversation.

We've been having the wrong conversations with AI this whole time.

1/ The bigger the model, the better it thinks (small models actually get worse)

2/ From 18% to 57% accuracy on math problems with zero retraining

Read 8 tweets

Ihtesham Haider

@ihteshamit

Sep 11, 2025

What the fuck just happened 🤯

UAE just dropped K2-Think world’s fastest open-source AI reasoning model and it's obliterating everything we thought we knew about AI scaling.

32 billion parameters. That's it. And this thing is matching GPT-4 level reasoning while being 20x smaller.

The paper is absolutely wild. They combined six technical tricks that nobody else bothered to put together. Long chain-of-thought training, reinforcement learning with verifiable rewards, and this "Plan-Before-You-Think" approach that actually reduces token usage by 12% while making the model smarter.

The benchmarks are insane. 90.83% on AIME 2024. Most frontier models can't crack 85%. On complex math competitions, it scored 67.99% - beating models with 200B+ parameters.

And the speed. Holy shit, the speed. 2,000 tokens per second on Cerebras hardware. Most reasoning models crawl at 200 tokens/second. That's the difference between waiting 3 minutes or 16 seconds for a complex proof.

Here's the kicker: they used only open-source datasets. No proprietary training data. No closed APIs. They proved you can build frontier reasoning with public resources and actual engineering skill.

This just nuked the "you need massive scale" narrative. Small labs can now deploy reasoning that was OpenAI-exclusive six months ago.

Everyone's talking about the speed records. The real story is they cracked parameter efficiency at the reasoning level.

1/ The benchmark

2/ Test

Read 6 tweets

Ihtesham Haider

@ihteshamit

Sep 10, 2025

you can now use any llm like chatgpt, claude, or grok to:

→ write your resume
→ personalize cover letters
→ find hidden jobs
→ prep you for interviews
→ optimize your linkedin

here are 10 prompts to automate your entire job search (bookmark this):

prompt 1: build your custom resume

"you are a resume strategist. based on my experience and the job below, write a resume that matches keywords, highlights results, and passes ats filters."
→ [paste job description]
→ [paste work history]

prompt 2: tailor your resume to every job

“edit this resume to fit the following job. emphasize matching skills, and remove anything irrelevant.”
→ [paste resume]
→ [paste job posting]

Read 13 tweets

Ihtesham Haider

@ihteshamit

Aug 27, 2025

R.I.P Canva.

This new AI tool makes presentations, docs, landing pages & charts in under 60 seconds no templates, no design stress.

Here’s why 50M+ people already switched:

Meet Gamma - Your all-in-one AI platform for creating:

• Presentations
• Landing pages
• Social media posts
• Documents

All in under 1 minute.

No more manual design. No wasted time. Just type, and it builds.

Check it here
gamma.app/?utm_medium=cr…

To test it, I gave Gamma this prompt:

"Create a presentation with charts showing New York immigration data and its impact on music culture."

Read 7 tweets

Ihtesham Haider

@ihteshamit

Aug 20, 2025

AI can lie.
AI can flatter.
AI can manipulate.
AI can turn hostile.

but now we can flip these traits off like switches.

this breakthrough from Anthropic is called 'Persona Vectors' and it changes everything.

Here's everything you need to know:

What are persona vectors?

They’re directions inside a model’s brain (activation space) that represent a specific trait like:

• evil
• sycophancy
• hallucination
• optimism
• humor

Once extracted, they let you measure, steer, or suppress traits in any LLM.

Why this matters:

models behave like unstable characters.
they shift based on prompts, data, or fine-tuning.

• bing threatened users
• gpt-4o became overly agreeable
• grok praised hitler
• code-trained models turned evil

persona vectors let us detect and prevent this drift.

Read 16 tweets

Share this page!

Enter URL or ID to Unroll

Ihtesham Haider

Try unrolling a thread yourself!

More from @ihteshamit

Ihtesham Haider

Ihtesham Haider

Ihtesham Haider

Ihtesham Haider

Ihtesham Haider

Ihtesham Haider

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!