Chris Laub Profile picture
Sep 24 16 tweets 3 min read Read on X
This Stanford paper just proved that 90% of prompt engineering advice is wrong.

I spent 6 months testing every "expert" technique. Most of it is folklore.

Here's what actually works (backed by real research):
The biggest lie: "Be specific and detailed"

Stanford researchers tested 100,000 prompts across 12 different tasks.

Longer prompts performed WORSE 73% of the time.

The sweet spot? 15-25 tokens for simple tasks, 40-60 for complex reasoning. Image
"Few-shot examples always help" - Total bullshit.

MIT's recent study shows few-shot examples hurt performance on 60% of tasks.

Why? Models get anchored to your examples and miss edge cases.

Zero-shot with good instructions beats few-shot 8 times out of 10. Image
The temperature myth everyone believes:

"Use 0.1 for factual tasks, 0.7 for creative tasks"

Google Research tested this across 50,000 prompts.

Optimal temperature varies by MODEL, not task. Claude-3.5 peaks at 0.3 for reasoning. GPT-4 at 0.15. Image
Chain-of-thought is overrated (and I can prove it)

Everyone worships "think step by step."

DeepMind's latest paper shows it only helps on 23% of real-world tasks.

Better approach: "Verify your answer" - improves accuracy 2.3x more than CoT. Image
The role-playing trap that's costing you accuracy:

"Act as an expert data scientist..."

Berkeley researchers found role-playing prompts reduce factual accuracy by 31%.

Models perform better as themselves, not cosplaying experts.
System prompts vs user prompts - The data is brutal:

OpenAI's internal research (leaked in their evals repo):

- System prompts: 67% instruction following
- User prompts: 89% instruction following

Everyone's using system prompts wrong.
The formatting that actually matters:

Markdown formatting: +12% accuracy
Numbered lists: +8% accuracy
ALL CAPS: -23% accuracy
Emojis: -15% accuracy

XML tags beat everything else by 31%.
Prompt length scaling laws nobody talks about:

Anthropic's research shows:

- 1-10 tokens: Linear performance gains
- 10-100 tokens: Logarithmic gains
- 100+ tokens: Performance degrades

More isn't better. Precision is everything.
The instruction hierarchy that changes everything:

Models follow this priority order:

1. Direct commands ("Do X")
2. Negative instructions ("Don't do Y")
3. Context/examples
4. Role definitions
5. Personality traits

Structure your prompts in this exact order.
Meta-prompting breakthrough from Google:

Instead of crafting perfect prompts, ask the model to write its own prompt.

"Write a prompt that would make you excel at [task]"

This beats human-written prompts 78% of the time.
The evaluation problem that's screwing everyone:

We're measuring prompts with human preferences.

But DeepMind proved human judges are wrong 43% of the time.

Better metric: Task completion rate + factual accuracy. That's it.
Scaling laws for prompt optimization:

- Small models (7B): Simple, direct prompts win
- Medium models (30B): Examples help significantly
- Large models (70B+): Reasoning instructions dominate
- Frontier models: Meta-cognitive approaches work

One size doesn't fit all.
The prompt engineering research pipeline that actually works:

1. Baseline with zero-shot direct instruction
2. A/B test instruction variations (not examples)
3. Measure task success, not human preference
4. Optimize for your specific model
5. Re-test every model update
Most "prompt engineering experts" are selling you expensive courses based on intuition.

The research exists. The data is public.

Stop following gurus. Start following papers.

Real prompt engineering is applied computational linguistics, not creative writing.
I hope you've found this thread helpful.

Follow me @ChrisLaubAI for more.

Like/Repost the quote below if you can:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Chris Laub

Chris Laub Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ChrisLaubAI

Sep 23
Everyone says "be authentic" on LinkedIn.

Then they post the same recycled motivational garbage.

I've been using AI to write posts that sound more human than most humans.

10 prompts I use in Claude that got me 50K followers in 6 months:
1. Create a high-performing LinkedIn post

“You are a top-performing LinkedIn ghostwriter.
Write a single post (max 300 words) on [topic] that provides insight, tells a short story, and ends with a strong takeaway or CTA.”
2. Turn tweets into full LinkedIn posts

“Expand this tweet into a high-performing LinkedIn post.
Keep the tone professional but conversational. Add more depth, examples, and a clear lesson.”
→ [Paste tweet]
Read 13 tweets
Sep 22
Claude > ChatGPT
Claude > Grok
Claude > Gemini

But 99.9% of the users don't know how to get 100% accurate results from Claude.

To fix this you need to learn how to write prompts Claude.

Here's a complete guide on how to prompts for Claude using XML tags to get best results: Image
XML tags work because Claude was trained on tons of structured data.

When you wrap instructions in <tags>, Claude treats them as separate, weighted components instead of one messy blob.

Think of it like giving Claude a filing system for your request.
Basic structure that changes everything:

XML:

You are an expert data analyst


Analyze this dataset and find the top 3 insights



This is quarterly sales data from a SaaS company



- Insight 1: [finding]
- Insight 2: [finding]
- Insight 3: [finding]


vs

General prompt:

"Analyze this data and give me insights"
Read 12 tweets
Sep 18
There’s a hidden setting in AI prompts nobody talks about.

Use it right, and models give 100% precise answers

it’s called Temperature Prompting

Let me show you how to use it and write prompts:
Every LLM (ChatGPT, Claude, Gemini, etc.) has a hidden setting called temperature.

- Low temp (0–0.3) = predictable, precise answers
- High temp (0.7–1.0+) = creative, exploratory answers

Most people don’t even know they can control this inside their prompts. Image
Think of it like this:

Temperature 0 = calculator. It gives the same answer every time.

Temperature 1 = brainstorm partner. You’ll get wild, varied ideas.

Neither is “better.” The trick is knowing when to use which. Image
Read 19 tweets
Sep 17
This blew my mind.

OpenAI just published the first comprehensive study of how 700 million people actually use ChatGPT.

The results destroy every assumption about AI adoption.

Here's everything you need to know in 3 minutes: Image
"ChatGPT is mainly for work"

Reality check: Only 27% of ChatGPT usage is work-related. 73% is personal. And the gap is widening every month.

The productivity revolution narrative completely misses how people actually use AI. Image
Top 3 use cases:

Forget coding and business automation. Here's what 700M people actually do:

1. Practical Guidance (29%) - Learning, how-to advice, tutoring
2. Seeking Information (24%) - Replacing Google searches
3. Writing (24%) - Editing emails, documents, content

These three account for 77% of ALL ChatGPT usage.Image
Read 13 tweets
Sep 16
Fuck YouTube tutorials.

I’m going to share 3 prompts that let you build complete AI agents without wasting hours.

Bookmark and repost this so you don't miss out 👇 Image
PROMPT 1: The Blueprint Maker

"I want to build an AI agent that [your specific goal]. Using N8N as the workflow engine and Claude as the AI brain, give me:

- Exact workflow structure
- Required nodes and connections
- API endpoints I'll need
- Data flow between each step
- Potential failure points and how to handle them

Be specific. No generic advice."
This prompt forces Claude to think like an engineer, not a content creator. You get actionable steps, not theory.

I use this for every new agent idea. Takes 2 minutes, saves 2 weeks of trial and error.
Read 9 tweets
Sep 15
I reverse-engineered the prompting techniques that OpenAI and Anthropic engineers use internally.

After 6 months of testing their methods, my AI outputs became 10x better.

Here are the 5 "insider secrets" that transformed my prompting game (most people have never heard of these):
1. Role Assignment

Don't just ask questions. Give the AI a specific role first.

❌ Bad: "How do I price my SaaS?"

✅ Good: "You're a SaaS pricing strategist who's worked with 100+ B2B companies. How should I price my project management tool?"

The AI immediately shifts into expert mode.
Role assignment works because it activates specific training patterns. When you say "you're a copywriter," the AI pulls from copywriting examples, not generic advice.

I use this for everything. Marketing strategy? "You're a CMO." Technical advice? "You're a senior engineer." It's that simple.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(