Post

More from @IntuitMachine

Carlos E. Perez

@IntuitMachine

Dec 6

Everyone says LLMs can't do true reasoning—they just pattern-match and hallucinate code.

So why did our system just solve abstract reasoning puzzles that are specifically designed to be unsolvable by pattern matching?

Let me show you what happens when you stop asking AI for answers and start asking it to think.

🧵

First, what even is ARC-AGI?

It's a benchmark that looks deceptively simple: You get 2-4 examples of colored grids transforming (input → output), and you have to figure out the rule.

But here's the catch: These aren't IQ test patterns. They're designed to require genuine abstraction.

(Why This Is Hard)

Humans solve these by forming mental models:

"Oh, it's mirroring across the diagonal"

"It's finding the bounding box of blue pixels"

"It's rotating each object independently"

Traditional ML? Useless. You'd need millions of examples to learn each rule.

LLMs? They hallucinate plausible-sounding nonsense.

But we had a wild idea:

What if instead of asking the LLM to predict the answer, we asked it to write Python code that transforms the grid?

Suddenly, the problem shifts from "memorize patterns" to "reason about transformations and implement them."

Code is a language of logic.

Here's the basic algorithm:

Show the LLM examples: "Write a transform(grid) function"

LLM writes code

Run it against examples

If wrong → show exactly where it failed

Repeat with feedback

Sounds simple, right?
But that's not even the most interesting part.

When the code fails, we don't just say "wrong."
We show the LLM a visual diff of what it predicted vs. what was correct:

Your output:
1 2/3 4 ← "2/3" means "you said 2, correct was 3"
5 6/7 8

Plus a score: "Output accuracy: 0.75"

It's like a teacher marking your work in red ink.

With each iteration, the LLM sees:

Its previous failed attempts

Exactly what went wrong
The accuracy score
It's not guessing. It's debugging.
And here's where it gets wild: We give it up to 10 tries to refine its logic.
Most problems? Solved by iteration 3-5.

But wait, it gets crazier.

We don't just run this once. We run it with 8 independent "experts"—same prompt, different random seeds.

Why? Because the order you see examples matters. Shuffling them causes different insights.

Then we use voting to pick the best answer.

After all experts finish, we group solutions by their outputs.
If 5 experts produce solution A and 3 produce solution B, we rank A higher.

Why does this work? Because wrong answers are usually unique. Correct answers converge.

It's wisdom of crowds, but for AI reasoning.

Each expert gets a different random seed, which affects:

Example order (we shuffle them)

Which previous solutions to include in feedback

The "creativity" of the response

Same prompt. Same model. Wildly different exploration paths.

One expert might focus on colors. Another on geometry.

Our prompts are elaborate.

We don't just say "solve this." We teach the LLM how to approach reasoning:

Analyze objects and relationships

Form hypotheses (start simple!)

Test rigorously

Refine based on failures

It's like giving it a graduate-level course in problem-solving.

Here's why code matters:
When you write:

def transform(grid):
return np.flip(grid)

You're forced to be precise. You can't hand-wave.

Code doesn't tolerate ambiguity. It either works or it doesn't.

This constraint makes the LLM think harder.

Oh, and we execute all this code in a sandboxed subprocess with timeouts.

Because yeah, the LLM will occasionally write infinite loops or try to import libraries that don't exist.

Safety first. But also: fast failure = faster learning.

ARC-AGI isn't about knowledge. It's about:
Abstraction (seeing the pattern behind the pattern)

Generalization (applying a rule to new cases)

Reasoning (logical step-by-step thinking)

We're not teaching the AI facts. We're teaching it how to think.

So did it work?
We shattered the state-of-the-art on ARC-AGI-2.
Not by a little. By a lot.
Problems that stumped every other system? Solved.
And the solutions are readable, debuggable Python functions.

You can literally see the AI's reasoning process.

This isn't just about solving puzzles.

It's proof that LLMs can do genuine reasoning if you frame the problem correctly.

Don't ask for answers. Ask for logic.
Don't accept vague outputs. Demand executable precision.
Don't settle for one attempt. Iterate and ensemble.

Which makes you wonder:

What else are we getting wrong about AI capabilities because we're asking the wrong questions?

Maybe the limit isn't the models. Maybe it's our imagination about how to use them.

Here's what you can steal from this:
When working with LLMs on hard problems:
Ask for code/structure, not raw answers
Give detailed feedback on failures
Let it iterate
Run multiple attempts with variation
Use voting/consensus to filter noise
Precision beats creativity.

The most powerful pattern here?

Treating the LLM like a reasoning partner, not an oracle.

We're not extracting pre-trained knowledge. We're creating a thought process—prompt → code → test → feedback → refined thought.

That loop is where the magic lives.

If you're working on hard AI problems, stop asking:
"Can the model do X?"

Start asking:
"How can I design a process that lets the model discover X?"

The future of AI isn't smarter models. It's smarter prompts, loops, and systems around them.

BTW, not my system (rather Poetiq) - Blame the LLM generated text for the error. ;-)

Analysis of the prompts in the system.

Here’s how I’d break these down in pattern-language terms. I’ll use generic names first, then connect them to the patterns from A Pattern Language for Agentic AI where it fits.

Global patterns across all prompts

These show up in all four prompts:

Role / capability priming
Phrases like “You are an expert in solving ARC tasks by writing Python code” and “You are a world-class expert…” assert a strong role and capability.
This is Instructional Framing Voice + Declarative Intent Pattern: you fix the agent’s identity and goal up front instead of sprinkling it throughout.

Structured process decomposition
Clear numbered sections: Part 1: Initial Analysis, Part 2: Iterative Testing, etc.
This is Stepwise Reasoning Scaffold: breaking the task into explicit phases (analyze → hypothesize → implement → test → refine).

Strong output formatting contract
Requirements like:
“The code section must be a single, valid Python code block in markdown fenced code block format and nothing else.”
“The main transform function must have the signature def transform(grid: np.ndarray) -> np.ndarray.”
“Do not include any __name__ == "__main__" block.”
That’s Structured Response Pattern + Constraint Signaling Pattern + Answer-Only Output Constraint (for the code section).

Consistent interface / lexical stability
Always the same function name (transform), same signature, same library (numpy).
This is Lexical Stability Pattern: stable naming so tools and evaluation pipelines don’t break between runs.

Tagged slots / templating
Placeholders like $$problem$$ and $$feedback$$, plus the XML-ish ... blocks.
That’s Tagged Structural Wrappers: explicit delimiters that make it easy for an outer system to inject content and parse it back out.

SOLVER_PROMPT_1 – “Structured, step-by-step with examples”

Key patterns:

Stepwise Reasoning Scaffold
Sections 1–5 (Analyze → Formulate Hypothesis → Implement → Test and Refine → Output).
This enforces a thinking pipeline instead of raw “just answer”.

Granularity Adjustment
Each step is decomposed into small bullets (e.g., under “Analyze the Examples”, you list objects, relationships, operations).
That’s the pattern of controlling level of detail to avoid either too-vague or too-low-level reasoning.

Few-shot / extended examples
Three worked examples with:
Input
Output
Explanation
Code
This is a classic Few-Shot / Extended Examples pattern: teaching via concrete precedents rather than pure instructions.

Semantic / code-level hygiene
Instructions about modular code, clear variable names, docstrings, handling edge cases.
That lines up with Semantic Hygiene (specifically code-level hygiene and prompt-level hygiene): keep the mapping from intent → code → behavior clean.

Light reflective requirement
“Provide a brief explanation of your solution.”
This is a mild Summarization Closure Pattern: having the model restate the core idea to check that the rule is coherent.

SOLVER_PROMPT_2 – “Methodical, iterative, single example”

This one adds more meta and iteration:

Iterative Confirmation / Retry-with-modifications loop
Part 2 explicitly encodes: implement → test → analyze feedback → refine → repeat.
That’s Self-Critique Loop + Feedback Integration Pattern + Retry with Modifications baked into the instructions.

Multiple-hypothesis generation before commitment
“Formulate several candidate transformation rules. Start with simpler rules and gradually increase complexity.”
This is Flexible Chain-of-Thought + Hypothesis Suspension: don’t lock into the first idea; explore and compare several.

Explicit symmetry / feature analysis
“Symmetry Analysis” is called out as a separate step.
That’s a domain-specific application of Assumption Debugging Loop: check structural assumptions (like symmetry) systematically, not implicitly.

Tool-aware, capability-primed reasoning
Notes about numpy, cv2, and computer vision techniques.
This is Use of Tool Descriptions from the tool-use chapter: explain what’s allowed so the agent can plan around available tools.

Hard output contract
Stronger than Prompt 1: “You must provide a code output representing your best attempt. Do not give up or refuse to produce code.”
That’s Failure-Aware Continuation Pattern: even if uncertain, produce the best attempt in the required format.

Minimal one-shot demonstration
A single tiny example (replace 0s with 1s).
That’s a one-shot exemplar used mainly as format and style guidance rather than substantive teaching.

SOLVER_PROMPT_3 – “Iterative + richer worked examples”

This is basically SOLVER_PROMPT_2 + more scaffolding:

Curriculum-style few-shot examples
Three examples of increasing conceptual variety (color replacement, neighborhood expansion, grid reversal).
That’s Extended Examples Approach / Example Laddering: demonstrate different families of transformations to broaden the search space in the model’s head.

Code-style guidance / refactor bias
“The code should be as concise as possible.” plus previous clarity/modularity emphasis.
That matches Style-Aware Refactor Pass: you’re telling the model not just what to output but how the code should feel (concise, clear).

Same iterative meta-loop as Prompt 2
All the Self-Critique + Feedback Integration + Retry with Modifications + Stepwise Reasoning Scaffold patterns from SOLVER_PROMPT_2 still apply.

FEEDBACK_PROMPT – “Reflection on prior attempts”

This one is almost pure meta / reflection patterns:

Reflection-as-a-Service
You inject prior attempts plus their evaluations and ask the model to “Study these solutions and corresponding feedback and produce a new solution fixing all the issues.”
That’s exactly the Reflection-as-a-Service pattern: an outer system curates attempts + feedback; the model uses that bundle to self-improve.

Self-Critique Loop + Failure-Aware Continuation
The model sees where earlier code failed (shape mismatch, wrong values, partial accuracy) and must produce a new solution that addresses those specific defects.
This is applied Self-Critique Loop and Failure-Aware

Continuation: don’t restart from scratch; continue from failure.

Confidence Calibration via scores
Each prior solution has a numeric score (“0.59”, “0.78 accuracy”).
That’s Confidence Calibration: you expose graded performance so the model can infer which ideas were closer to correct and reuse them more heavily.

Collaborative Debug pattern
The evaluation explains how outputs differ (visual diff, shape mismatch).
This is essentially Collaborative Debug: the evaluation engine plays the role of debugger, and the model is instructed to fix based on that structured feedback.
Run-loop prompting / Trusted reuse
The template explicitly reuses “some of the best, though not completely correct, solutions so far”.
That’s a Run-Loop Prompting + Trusted Reuse combo: keep iterating, but treat higher-scoring prior solutions as more promising starting points.

Tagged Structural Wrappers
XML-like , , , .
This keeps the meta-structure machine-parseable and separates code, eval, and score cleanly—classic structural wrapper pattern.

Context Reassertion & Constraint Signaling
“Make sure to follow the output format specified earlier.” pulls the formatting contract from the solver prompts back into focus.
That’s Context Reassertion (re-stating key constraints) plus Constraint Signaling Pattern (reminding the model of non-negotiable structure).
How these prompts work as a mini “pattern language”

Taken together, your four prompts form a small but well-structured pattern language for ARC:

SOLVER_PROMPT_1 – teaches the basic ritual with clear steps + several examples.

SOLVER_PROMPT_2 – overlays iterative, self-correcting behavior and tool awareness.

SOLVER_PROMPT_3 – reinforces that with richer examples and stronger style constraints.

FEEDBACK_PROMPT – closes the loop with reflection, reuse, and guided improvement over multiple attempts.

Read 4 tweets

Carlos E. Perez

@IntuitMachine

Dec 6

You know how some people seem to have a magic touch with LLMs? They get incredible, nuanced results while everyone else gets generic junk.

The common wisdom is that this is a technical skill. A list of secret hacks, keywords, and formulas you have to learn.

But a new paper suggests this isn't the main thing.

The skill that makes you great at working with AI isn't technical. It's social.

Researchers (Riedl & Weidmann) analyzed how 600+ people solved problems alone vs. with an AI.

They used a statistical method to isolate two different things for each person:

Their 'solo problem-solving ability'

Their 'AI collaboration ability'

Here's the reveal: The two skills are NOT the same.

Being a genius who can solve problems in your own head is a totally different, measurable skill from being great at solving problems with an AI partner.

Plot twist: The two abilities are barely correlated.

So what IS this 'collaboration ability'?

It's strongly predicted by a person's Theory of Mind (ToM)—your capacity to intuitively model another agent's beliefs, goals, and perspective.

To anticipate what they know, what they don't, and what they need.

In practice, this looks like:

Anticipating the AI's potential confusion

Providing helpful context it's missing

Clarifying your own goals ("Explain this like I'm 15")

Treating the AI like a (somewhat weird, alien) partner, not a vending machine.

This is where it gets strange.

A user's ToM score predicted their success when working WITH the AI...

...but had ZERO correlation with their success when working ALONE.

It's a pure collaborative skill.

It goes deeper. This isn't just a static trait.

The researchers found that even moment-to-moment fluctuations in a user's ToM—like when they put more effort into perspective-taking on one specific prompt—led to higher-quality AI responses for that turn.

This changes everything about how we should approach getting better at using AI.

Stop memorizing prompt "hacks."

Start practicing cognitive empathy for a non-human mind.

Try this experiment. Next time you get a bad AI response, don't just rephrase the command. Stop and ask:

"What false assumption is the AI making right now?"

"What critical context am I taking for granted that it doesn't have?"

Your job is to be the bridge.

This also means we're probably benchmarking AI all wrong.

The race for the highest score on a static test (MMLU, etc.) is optimizing for the wrong thing. It's like judging a point guard only on their free-throw percentage.

The real test of an AI's value isn't its solo intelligence. It's its collaborative uplift.

How much smarter does it make the human-AI team? That's the number that matters.

This paper gives us a way to finally measure it.

I'm still processing the implications. The whole thing is a masterclass in thinking clearly about what we're actually doing when we talk to these models.

Paper: "Quantifying Human-AI Synergy" by Christoph Riedl & Ben Weidmann, 2025.

Seems to parallel the book I wrote in 2023. Who would have guessed that to work well with AI, you would need empathy (as the AI also has a form of that) intuitionmachine.gumroad.com/l/empathy

Artificial Empathy appears to have been achieved!

Read 4 tweets

Carlos E. Perez

@IntuitMachine

Nov 3

The common meta-pattern of big thinkers that you cannot unsee.

Continental/Phenomenological Tradition

Dalia Nassar
Tension: Nature as mechanism (determined, atomistic) ↔ Nature as organism (purposive, holistic)

Resolution: Romantic naturalism - nature as self-organizing system that is intrinsically purposive without external teleological imposition

Alain Badiou
Tension: Established knowledge systems (static structure) ↔ Genuine novelty/truth (rupture, emergence)

Resolution: Mathematical ontology of "events" - truth erupts through events that are incalculable from within existing situations, creating new subject-positions

Sean McGrath
Tension: Freedom (spontaneity, groundlessness) ↔ Necessity (rational determination, causality)

Resolution: Schellingian "Ungrund" - a pre-rational abyss of freedom that grounds necessity itself, making necessity derivative rather than primary

Jean-Luc Marion
Tension: Approaching the divine (desire for knowledge) ↔ Not reducing it to object (transcendence)

Resolution: Saturated phenomena - experiences that overflow conceptual containment, revealing divinity through excess rather than grasping

Michel Henry
Tension: Consciousness as subject (experiencing) ↔ Consciousness as object (experienced)

Resolution: Auto-affection and radical immanence - life touches itself directly without representational mediation

Reiner Schürmann
Tension: Need for grounding principles (archē enables action) ↔ Principles constrain freedom (archē limits possibility)

Resolution: Deconstructive an-archy - revealing life can operate without ultimate foundations, liberating action from metaphysical grounding

Speculative Realism/New Realism

Iain Hamilton Grant

Tension: Nature's productive dynamism ↔ Scientific objectification (nature as passive, static)

Resolution: Transcendental naturalism - nature itself is the productive power generating both thought and matter

Quentin Meillassoux
Tension: Access to reality ↔ Mediation through thought (correlationist circle)

Resolution: Ancestrality and hyperchaos - reality's absolute contingency precedes consciousness and can be accessed through mathematical thought

Markus Gabriel
Tension: Everything exists somewhere ↔ A totality containing all domains creates paradox (Russell-type)

Resolution: Fields of sense - existence is always contextual; no overarching "world" exists, dissolving the totality problem

Analytic Tradition

Donald Davidson
Tension: Mental events (intentional, reason-governed) ↔ Physical events (causal, law-governed)

Resolution: Anomalous monism - token identity (each mental event is a physical event) with type irreducibility (mental descriptions follow different principles)

Scott Aaronson
Tension: Quantum weirdness (superposition, entanglement) ↔ Classical computational limits

Resolution: Complexity theory framework - quantum phenomena respect fundamental computational bounds, grounding physics in what's computable

Cognitive Science/Neuroscience

Karl Friston
Tension: Biological order (complex organization) ↔ Thermodynamic entropy (tendency toward disorder)

Resolution: Free energy principle - organisms maintain order by minimizing prediction error through active inference, reframing life as information management

Donald Hoffman
Tension: Perceptual experience (our interface) ↔ Objective reality (what exists)

Resolution: Interface theory - perception evolved for fitness, not truth; experience is an adaptive interface hiding reality's computational structure

Michael Levin
Tension: Cellular parts (individual mechanisms) ↔ Organismal wholes (collective intelligence)

Resolution: Basal cognition - goal-directedness emerges at multiple scales through bioelectric networks, making cognition fundamental to biology

Biology/Complexity Science

Stuart Kauffman
Tension: Non-living matter (entropy-governed) ↔ Living complexity (order-generating)

Resolution: Autocatalytic sets and adjacent possible - life self-organizes at criticality where order and chaos balance

Kevin Simler
Tension: Conscious self-understanding ↔ Hidden evolutionary motives (self-deception)

Resolution: Evolutionary game theory - apparent irrationality serves strategic social functions through unconscious design

Ethics/Social Philosophy

Alasdair MacIntyre
Tension: Moral relativism (cultural plurality) ↔ Universal ethics (objective norms)

Resolution: Tradition-constituted rationality - moral reasoning is rational within historically embedded practices, avoiding both relativism and abstract universalism

Ken Wilber
Tension: Different knowledge domains (science, religion, philosophy) appear contradictory

Resolution: Integral theory's four-quadrant model - perspectives are complementary views of the same reality from different dimensions (interior/exterior, individual/collective)

Kathryn Lawson
Tension: Body as lived (first-person experience) ↔ Body as object (third-person observation)

Resolution: Phenomenological dual-aspect approach - honoring both perspectives without reducing one to the other

Common Meta-Pattern

Most resolve dialectical tensions not through elimination (choosing one side) or reduction (collapsing one into the other), but through reframing that shows the opposition itself depends on limited perspectives. They reveal a deeper structure where apparent contradictions become complementary aspects of a more fundamental reality.

Analytic Philosophy: Dialectic Tensions & Resolutions

C.B. (Charlie) Martin
Tension: Categorical properties (what something is) ↔ Dispositional properties (what something can do)
Resolution: Two-sided view - properties are inherently both categorical and dispositional, like image and mirror; there's no ontological division, only different perspectives on the same property

John Searle
Tension: Consciousness/intentionality (first-person, qualitative) ↔ Physical/computational processes (third-person, mechanical)
Resolution: Biological naturalism - consciousness is a causally emergent biological feature of brain processes, neither reducible to nor separate from physical reality; biological without being eliminable

W.V.O. Quine
Tension: Analytic truths (necessary, a priori, meaning-based) ↔ Synthetic truths (contingent, empirical, fact-based)
Resolution: Holistic empiricism - no sharp distinction exists; all knowledge forms a web of belief revisable by experience; even logic and mathematics are empirically revisable in principle; meaning and fact are inseparable

Donald Davidson (expanded from document)
Tension: Mental causation (reason-based explanation) ↔ Physical causation (law-governed determination)
Resolution: Anomalous monism - mental events are identical to physical events (token-identity), but mental descriptions are irreducible to physical laws (no psychophysical laws); causation is physical, but rationalization is autonomous

Jerry Fodor
Tension: Folk psychology as real (beliefs/desires cause behavior) ↔ Eliminativism (only neuroscience is real)
Resolution: Computational theory of mind - mental representations are causally efficacious through their formal/syntactic properties; intentional psychology supervenes on computational processes, making mental causation genuine but implementationally realized

Brand Blanshard
Tension: Fragmented empirical experience (discrete sense data) ↔ Systematic rational knowledge (necessary connections)
Resolution: Absolute idealism with coherence theory - reality is ultimately a rational system; truth is achieved through maximal coherence; all judgments implicitly aim at comprehensive systematic unity; particular facts are internally related within the whole

Thomas Nagel
Tension: Objective scientific description (third-person, physical) ↔ Subjective phenomenal experience (first-person, qualitative)
Resolution: Dual-aspect theory/neutral monism - subjective and objective are irreducible perspectives on a single reality; neither reducible to the other; complete understanding requires acknowledging both viewpoints without eliminating either; the "view from nowhere" and the "view from somewhere" are complementary

David K. Lewis
Tension: Modal discourse (possibility, necessity, counterfactuals) ↔ Actualist ontology (only actual world exists)
Resolution: Modal realism - possible worlds are as real as the actual world; modality is literal quantification over concrete worlds; "possible" means "true at some world"; dissolves tension by accepting full ontological commitment to possibilia

Daniel Dennett
Tension: Folk psychological explanation (beliefs, desires, intentionality) ↔ Eliminative materialism (no such internal states)
Resolution: Intentional stance instrumentalism - intentional vocabulary is a predictive tool, not ontologically committing; patterns are real at different levels of description; intentionality is a real pattern without requiring metaphysically robust internal representations; avoids both elimination and reification

Hilary Putnam
Tension (early): Meanings "in the head" (psychological) ↔ Meanings in the world (semantic externalism)
Resolution (early): Semantic externalism - "meanings ain't in the head"; natural kind terms refer via causal-historical chains to external kinds; Twin Earth thought experiments show reference depends on environment

Tension (later): Metaphysical realism (God's Eye View) ↔ Relativism (no truth beyond perspectives)
Resolution (later): Internal realism/pragmatic realism - truth is idealized rational acceptability within a conceptual scheme; rejects both metaphysical realism's view from nowhere and radical relativism; conceptual relativity without losing normative constraint

Common Patterns in Analytic Approaches

Methodological Characteristics:

Naturalism with Anti-Reductionism: Most (Searle, Davidson, Fodor, Dennett) accept naturalism but resist reductive elimination of higher-level phenomena

Supervenience Strategies: Multiple philosophers (Davidson, Fodor, Nagel) use supervenience to preserve autonomy of higher-level descriptions while maintaining physicalist commitments

Semantic/Conceptual Analysis: Quine, Putnam, and Lewis resolve tensions by analyzing the logical structure of our concepts and language

Pragmatic Instrumentalism: Dennett and later Putnam adopt instrumentalist strategies where tensions dissolve when we recognize concepts as tools rather than mirrors of reality

Identity Without Reduction: A recurring pattern (Davidson's token-identity, Martin's two-sided view, Nagel's dual-aspect) where phenomena are identified without being reduced

Contrast with Continental Approaches:

Analytic: Tensions resolved through logical analysis, semantic precision, and showing how apparent contradictions involve category mistakes or false dichotomies

Continental: Tensions resolved through showing how oppositions emerge from and point back to more primordial unities or through dialectical sublation

Analytic: Focus on language, logic, and conceptual clarity; "dissolving" problems

Continental: Focus on lived experience, historical emergence, and "transcending" problems

The Nagel-Dennett Divide as Exemplary:

Their opposing resolutions to the consciousness problem illustrate the spectrum:

Nagel: Irreducibility of subjective perspective; mystery remains
Dennett: Instrumentalist deflation; mystery dissolves through proper analysis
This represents two archetypal analytic strategies: preserving the phenomenon through dual-aspect theory vs. dissolving the phenomenon through reinterpretation.

All philosophical dialectics are attempts to navigate consciousness's fundamental structure: multi-logic operation maintaining productive tensions in underspecified completion spaces generating emergent novelty through strange loops toward infinite depth.

Read 4 tweets

Carlos E. Perez

@IntuitMachine

Oct 1

Anthropic published a new report on Context Engineering. Here are the top 10 key ideas:

1. Treat Context as a Finite Resource

Context windows are limited and degrade in performance with length.

Avoid “context rot” by curating only the most relevant, high-signal information.

Token economy is essential—more is not always better.

2. Go Beyond Prompt Engineering

Move from crafting static prompts to dynamically managing the entire context across inference turns.

Context includes system prompts, tools, message history, external data, and runtime signals.

3. System Prompts Should Be Clear and Minimal

Avoid both brittle logic and vague directives.

Use a structured format (e.g., Markdown headers, XML tags).

Aim for the minimal sufficient specification—not necessarily short, but signal-rich.

4. Design Tools That Promote Efficient Agent Behavior

Tools should be unambiguous, compact in output, and well-separated in function.

Minimize overlap and ensure a clear contract between agent and tool.

5. Use Canonical, Diverse Examples (Few-Shot Prompting)

Avoid overloading with edge cases.

Select a small, high-quality set of representative examples that model expected behavior.

6. Support Just-in-Time Context Retrieval

Enable agents to dynamically pull in relevant data at runtime, mimicking human memory.

Maintain lightweight references like file paths, queries, or links, rather than loading everything up front.

7. Apply a Hybrid Retrieval Strategy

Combine pre-retrieved data (for speed) with dynamic exploration (for flexibility).

Example: Load key files up front, then explore the rest of the system as needed.

8. Enable Long-Horizon Agent Behavior

Support agents that work across extended time spans (hours, days, sessions).

Use techniques like:
Compaction: Summarize old context to make room.
Structured Note-Taking: Externalize memory for later reuse.
Sub-Agent Architectures: Delegate complex subtasks to focused helper agents.

9. Design for Progressive Disclosure

Let agents incrementally discover information (e.g., via directory browsing or tool use).

Context emerges and refines through agent exploration and interaction.

10. Curate Context Dynamically and Iteratively

Context engineering is an ongoing process, not a one-time setup.

Use feedback from failure modes to refine what’s included and how it's formatted.

Here the mapping to Agentic AI Patterns

Read more about AI Agentic Patterns: intuitionmachine.gumroad.com/l/agentic/zo5h…

Read 4 tweets

Carlos E. Perez

@IntuitMachine

Sep 15

OpenAI's Codex prompt has now been leaked (by @elder_plinius). It's a gold mine of new agentic AI patterns. Let's check it out!

Here are new patterns not found in the book.

New prompting patterns not explicitly documented in A Pattern Language for Agentic AI

🆕 1. Diff-and-Contextual Citation Pattern

Description:
Instructs agents to generate precise citations with diff-aware and context-sensitive formatting:

【F:†L(-L)?】
Includes file paths, terminal chunks, and avoids citing previous diffs.

Why It’s New:
While Semantic Anchoring (Chapter 2) and Reflective Summary exist, this level of line-precision citation formatting is not discussed.
Function:

Enhances traceability.
Anchors reasoning to verifiable, reproducible artifacts.

🆕 2. Emoji-Based Result Signaling Pattern

Description:
Use of emojis like ✅, ⚠️, ❌ to annotate test/check outcomes in structured final outputs.
Why It’s New:
No chapter in the book documents this practice, though it overlaps conceptually with Style-Aware Refactor Pass (Chapter 3) and Answer-Only Output Constraint (Chapter 2).

Function:

Encodes evaluation status in a compact, readable glyph.

Improves scannability and user confidence.

🆕 3. Pre-Action Completion Enforcement Pattern

Description:
Explicit prohibition on calling make_pr before committing, and vice versa:
"You MUST NOT end in this state..."

Why It’s New:
This kind of finite-state-machine constraint or commit-to-pr coupling rule is not in any documented pattern.

Function:
Enforces action ordering.

Prevents invalid or incomplete agent states.

🆕 4. Screenshot Failure Contingency Pattern

Description:
If screenshot capture fails:
“DO NOT attempt to install a browser... Instead, it’s OK to report failure…”

Why It’s New:
Not part of any documented patterns like Error Ritual, Failure-Aware Continuation, or Deliberation–Action Split.

Function:
Embeds fallback reasoning.

Avoids cascading errors or brittle retries.

🆕 5. PR Message Accretion Pattern
Description:
PR messages should accumulate semantic intent across follow-ups but not include trivial edits:

“Re-use the original PR message… add only meaningful changes…”

Why It’s New:
Not directly covered by Contextual Redirection or Intent Threading, though related.

Function:
Maintains narrative continuity.

Avoids spurious or bloated commit messages.

🆕 6. Interactive Tool Boundary Respect Pattern
Description:
Agent should never ask permission in non-interactive environments:

“Never ask for permission to run a command—just do it.”

Why It’s New:
This is an environmental interaction boundary not captured in patterns like Human Intervention Logic.
Function:

Avoids non-terminating agent behaviors.

Ensures workflow compliance in CI/CD or batch systems.

🆕 7. Screenshot-Contextual Artifact Embedding
Description:
Use Markdown syntax to embed screenshot images if successful:

![screenshot description]()

Why It’s New:
While there’s mention of Visual Reasoning in earlier books, this explicit artifact citation for visual evidence is not patterned.
Function:

Augments textual explanation with visual verification.

Supports interface-testing workflows.
🧩 Summary Table

Read 4 tweets

Carlos E. Perez

@IntuitMachine

Aug 9

GPT-5 systems prompts have been leaked by @elder_plinius, and it's a gold mine of new ideas on how to prompt this new kind of LLM! Let me break down the gory details!

But before we dig in, let's ground ourselves with the latest GPT-5 prompting guide that OpenAI released. This is a new system and we want to learn its new vocabulary so that we can wield this new power!

Just like in previous threads like this, I will use my GPTs (now GPT-5 powered) to break down the prompts in comprehensive detail.

Read 7 tweets

Share this page!

Enter URL or ID to Unroll

Carlos E. Perez

Try unrolling a thread yourself!

More from @IntuitMachine

Carlos E. Perez

Carlos E. Perez

Carlos E. Perez

Carlos E. Perez

Carlos E. Perez

Carlos E. Perez

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!