Carlos E. Perez Profile picture
Quaternion Process Theory, Artificial (Intuition, Fluency, Empathy), Patterns for (Generative, Reason, Agentic) AI, https://t.co/fhXw0zjxXp
24 subscribers
Dec 6 4 tweets 9 min read
Everyone says LLMs can't do true reasoning—they just pattern-match and hallucinate code.

So why did our system just solve abstract reasoning puzzles that are specifically designed to be unsolvable by pattern matching?

Let me show you what happens when you stop asking AI for answers and start asking it to think.

🧵

First, what even is ARC-AGI?

It's a benchmark that looks deceptively simple: You get 2-4 examples of colored grids transforming (input → output), and you have to figure out the rule.

But here's the catch: These aren't IQ test patterns. They're designed to require genuine abstraction.

(Why This Is Hard)

Humans solve these by forming mental models:

"Oh, it's mirroring across the diagonal"

"It's finding the bounding box of blue pixels"

"It's rotating each object independently"

Traditional ML? Useless. You'd need millions of examples to learn each rule.

LLMs? They hallucinate plausible-sounding nonsense.

But we had a wild idea:

What if instead of asking the LLM to predict the answer, we asked it to write Python code that transforms the grid?

Suddenly, the problem shifts from "memorize patterns" to "reason about transformations and implement them."

Code is a language of logic.

Here's the basic algorithm:

Show the LLM examples: "Write a transform(grid) function"

LLM writes code

Run it against examples

If wrong → show exactly where it failed

Repeat with feedback

Sounds simple, right?
But that's not even the most interesting part.

When the code fails, we don't just say "wrong."
We show the LLM a visual diff of what it predicted vs. what was correct:

Your output:
1 2/3 4 ← "2/3" means "you said 2, correct was 3"
5 6/7 8

Plus a score: "Output accuracy: 0.75"

It's like a teacher marking your work in red ink.

With each iteration, the LLM sees:

Its previous failed attempts

Exactly what went wrong
The accuracy score
It's not guessing. It's debugging.
And here's where it gets wild: We give it up to 10 tries to refine its logic.
Most problems? Solved by iteration 3-5.

But wait, it gets crazier.

We don't just run this once. We run it with 8 independent "experts"—same prompt, different random seeds.

Why? Because the order you see examples matters. Shuffling them causes different insights.

Then we use voting to pick the best answer.

After all experts finish, we group solutions by their outputs.
If 5 experts produce solution A and 3 produce solution B, we rank A higher.

Why does this work? Because wrong answers are usually unique. Correct answers converge.

It's wisdom of crowds, but for AI reasoning.

Each expert gets a different random seed, which affects:

Example order (we shuffle them)

Which previous solutions to include in feedback

The "creativity" of the response

Same prompt. Same model. Wildly different exploration paths.

One expert might focus on colors. Another on geometry.

Our prompts are elaborate.

We don't just say "solve this." We teach the LLM how to approach reasoning:

Analyze objects and relationships

Form hypotheses (start simple!)

Test rigorously

Refine based on failures

It's like giving it a graduate-level course in problem-solving.

Here's why code matters:
When you write:

def transform(grid):
return np.flip(grid)

You're forced to be precise. You can't hand-wave.

Code doesn't tolerate ambiguity. It either works or it doesn't.

This constraint makes the LLM think harder.

Oh, and we execute all this code in a sandboxed subprocess with timeouts.

Because yeah, the LLM will occasionally write infinite loops or try to import libraries that don't exist.

Safety first. But also: fast failure = faster learning.

ARC-AGI isn't about knowledge. It's about:
Abstraction (seeing the pattern behind the pattern)

Generalization (applying a rule to new cases)

Reasoning (logical step-by-step thinking)

We're not teaching the AI facts. We're teaching it how to think.

So did it work?
We shattered the state-of-the-art on ARC-AGI-2.
Not by a little. By a lot.
Problems that stumped every other system? Solved.
And the solutions are readable, debuggable Python functions.

You can literally see the AI's reasoning process.

This isn't just about solving puzzles.

It's proof that LLMs can do genuine reasoning if you frame the problem correctly.

Don't ask for answers. Ask for logic.
Don't accept vague outputs. Demand executable precision.
Don't settle for one attempt. Iterate and ensemble.

Which makes you wonder:

What else are we getting wrong about AI capabilities because we're asking the wrong questions?

Maybe the limit isn't the models. Maybe it's our imagination about how to use them.

Here's what you can steal from this:
When working with LLMs on hard problems:
Ask for code/structure, not raw answers
Give detailed feedback on failures
Let it iterate
Run multiple attempts with variation
Use voting/consensus to filter noise
Precision beats creativity.

The most powerful pattern here?

Treating the LLM like a reasoning partner, not an oracle.

We're not extracting pre-trained knowledge. We're creating a thought process—prompt → code → test → feedback → refined thought.

That loop is where the magic lives.

If you're working on hard AI problems, stop asking:
"Can the model do X?"

Start asking:
"How can I design a process that lets the model discover X?"

The future of AI isn't smarter models. It's smarter prompts, loops, and systems around them.Image BTW, not my system (rather Poetiq) - Blame the LLM generated text for the error. ;-)
Dec 6 4 tweets 3 min read
You know how some people seem to have a magic touch with LLMs? They get incredible, nuanced results while everyone else gets generic junk.

The common wisdom is that this is a technical skill. A list of secret hacks, keywords, and formulas you have to learn.

But a new paper suggests this isn't the main thing.

The skill that makes you great at working with AI isn't technical. It's social.

Researchers (Riedl & Weidmann) analyzed how 600+ people solved problems alone vs. with an AI.

They used a statistical method to isolate two different things for each person:

Their 'solo problem-solving ability'

Their 'AI collaboration ability'

Here's the reveal: The two skills are NOT the same.

Being a genius who can solve problems in your own head is a totally different, measurable skill from being great at solving problems with an AI partner.

Plot twist: The two abilities are barely correlated.

So what IS this 'collaboration ability'?

It's strongly predicted by a person's Theory of Mind (ToM)—your capacity to intuitively model another agent's beliefs, goals, and perspective.

To anticipate what they know, what they don't, and what they need.

In practice, this looks like:

Anticipating the AI's potential confusion

Providing helpful context it's missing

Clarifying your own goals ("Explain this like I'm 15")

Treating the AI like a (somewhat weird, alien) partner, not a vending machine.

This is where it gets strange.

A user's ToM score predicted their success when working WITH the AI...

...but had ZERO correlation with their success when working ALONE.

It's a pure collaborative skill.

It goes deeper. This isn't just a static trait.

The researchers found that even moment-to-moment fluctuations in a user's ToM—like when they put more effort into perspective-taking on one specific prompt—led to higher-quality AI responses for that turn.

This changes everything about how we should approach getting better at using AI.

Stop memorizing prompt "hacks."

Start practicing cognitive empathy for a non-human mind.

Try this experiment. Next time you get a bad AI response, don't just rephrase the command. Stop and ask:

"What false assumption is the AI making right now?"

"What critical context am I taking for granted that it doesn't have?"

Your job is to be the bridge.

This also means we're probably benchmarking AI all wrong.

The race for the highest score on a static test (MMLU, etc.) is optimizing for the wrong thing. It's like judging a point guard only on their free-throw percentage.

The real test of an AI's value isn't its solo intelligence. It's its collaborative uplift.

How much smarter does it make the human-AI team? That's the number that matters.

This paper gives us a way to finally measure it.

I'm still processing the implications. The whole thing is a masterclass in thinking clearly about what we're actually doing when we talk to these models.

Paper: "Quantifying Human-AI Synergy" by Christoph Riedl & Ben Weidmann, 2025.Image Seems to parallel the book I wrote in 2023. Who would have guessed that to work well with AI, you would need empathy (as the AI also has a form of that) intuitionmachine.gumroad.com/l/empathy
Nov 3 4 tweets 6 min read
The common meta-pattern of big thinkers that you cannot unsee.

Continental/Phenomenological Tradition

Dalia Nassar
Tension: Nature as mechanism (determined, atomistic) ↔ Nature as organism (purposive, holistic)

Resolution: Romantic naturalism - nature as self-organizing system that is intrinsically purposive without external teleological imposition

Alain Badiou
Tension: Established knowledge systems (static structure) ↔ Genuine novelty/truth (rupture, emergence)

Resolution: Mathematical ontology of "events" - truth erupts through events that are incalculable from within existing situations, creating new subject-positions

Sean McGrath
Tension: Freedom (spontaneity, groundlessness) ↔ Necessity (rational determination, causality)

Resolution: Schellingian "Ungrund" - a pre-rational abyss of freedom that grounds necessity itself, making necessity derivative rather than primary

Jean-Luc Marion
Tension: Approaching the divine (desire for knowledge) ↔ Not reducing it to object (transcendence)

Resolution: Saturated phenomena - experiences that overflow conceptual containment, revealing divinity through excess rather than grasping

Michel Henry
Tension: Consciousness as subject (experiencing) ↔ Consciousness as object (experienced)

Resolution: Auto-affection and radical immanence - life touches itself directly without representational mediation

Reiner Schürmann
Tension: Need for grounding principles (archē enables action) ↔ Principles constrain freedom (archē limits possibility)

Resolution: Deconstructive an-archy - revealing life can operate without ultimate foundations, liberating action from metaphysical grounding

Speculative Realism/New Realism

Iain Hamilton Grant

Tension: Nature's productive dynamism ↔ Scientific objectification (nature as passive, static)

Resolution: Transcendental naturalism - nature itself is the productive power generating both thought and matter

Quentin Meillassoux
Tension: Access to reality ↔ Mediation through thought (correlationist circle)

Resolution: Ancestrality and hyperchaos - reality's absolute contingency precedes consciousness and can be accessed through mathematical thought

Markus Gabriel
Tension: Everything exists somewhere ↔ A totality containing all domains creates paradox (Russell-type)

Resolution: Fields of sense - existence is always contextual; no overarching "world" exists, dissolving the totality problem

Analytic Tradition

Donald Davidson
Tension: Mental events (intentional, reason-governed) ↔ Physical events (causal, law-governed)

Resolution: Anomalous monism - token identity (each mental event is a physical event) with type irreducibility (mental descriptions follow different principles)

Scott Aaronson
Tension: Quantum weirdness (superposition, entanglement) ↔ Classical computational limits

Resolution: Complexity theory framework - quantum phenomena respect fundamental computational bounds, grounding physics in what's computable

Cognitive Science/Neuroscience

Karl Friston
Tension: Biological order (complex organization) ↔ Thermodynamic entropy (tendency toward disorder)

Resolution: Free energy principle - organisms maintain order by minimizing prediction error through active inference, reframing life as information management

Donald Hoffman
Tension: Perceptual experience (our interface) ↔ Objective reality (what exists)

Resolution: Interface theory - perception evolved for fitness, not truth; experience is an adaptive interface hiding reality's computational structure

Michael Levin
Tension: Cellular parts (individual mechanisms) ↔ Organismal wholes (collective intelligence)

Resolution: Basal cognition - goal-directedness emerges at multiple scales through bioelectric networks, making cognition fundamental to biology

Biology/Complexity Science

Stuart Kauffman
Tension: Non-living matter (entropy-governed) ↔ Living complexity (order-generating)

Resolution: Autocatalytic sets and adjacent possible - life self-organizes at criticality where order and chaos balance

Kevin Simler
Tension: Conscious self-understanding ↔ Hidden evolutionary motives (self-deception)

Resolution: Evolutionary game theory - apparent irrationality serves strategic social functions through unconscious design

Ethics/Social Philosophy

Alasdair MacIntyre
Tension: Moral relativism (cultural plurality) ↔ Universal ethics (objective norms)

Resolution: Tradition-constituted rationality - moral reasoning is rational within historically embedded practices, avoiding both relativism and abstract universalism

Ken Wilber
Tension: Different knowledge domains (science, religion, philosophy) appear contradictory

Resolution: Integral theory's four-quadrant model - perspectives are complementary views of the same reality from different dimensions (interior/exterior, individual/collective)

Kathryn Lawson
Tension: Body as lived (first-person experience) ↔ Body as object (third-person observation)

Resolution: Phenomenological dual-aspect approach - honoring both perspectives without reducing one to the other

Common Meta-Pattern

Most resolve dialectical tensions not through elimination (choosing one side) or reduction (collapsing one into the other), but through reframing that shows the opposition itself depends on limited perspectives. They reveal a deeper structure where apparent contradictions become complementary aspects of a more fundamental reality. Analytic Philosophy: Dialectic Tensions & Resolutions

C.B. (Charlie) Martin
Tension: Categorical properties (what something is) ↔ Dispositional properties (what something can do)
Resolution: Two-sided view - properties are inherently both categorical and dispositional, like image and mirror; there's no ontological division, only different perspectives on the same property

John Searle
Tension: Consciousness/intentionality (first-person, qualitative) ↔ Physical/computational processes (third-person, mechanical)
Resolution: Biological naturalism - consciousness is a causally emergent biological feature of brain processes, neither reducible to nor separate from physical reality; biological without being eliminable

W.V.O. Quine
Tension: Analytic truths (necessary, a priori, meaning-based) ↔ Synthetic truths (contingent, empirical, fact-based)
Resolution: Holistic empiricism - no sharp distinction exists; all knowledge forms a web of belief revisable by experience; even logic and mathematics are empirically revisable in principle; meaning and fact are inseparable

Donald Davidson (expanded from document)
Tension: Mental causation (reason-based explanation) ↔ Physical causation (law-governed determination)
Resolution: Anomalous monism - mental events are identical to physical events (token-identity), but mental descriptions are irreducible to physical laws (no psychophysical laws); causation is physical, but rationalization is autonomous

Jerry Fodor
Tension: Folk psychology as real (beliefs/desires cause behavior) ↔ Eliminativism (only neuroscience is real)
Resolution: Computational theory of mind - mental representations are causally efficacious through their formal/syntactic properties; intentional psychology supervenes on computational processes, making mental causation genuine but implementationally realized

Brand Blanshard
Tension: Fragmented empirical experience (discrete sense data) ↔ Systematic rational knowledge (necessary connections)
Resolution: Absolute idealism with coherence theory - reality is ultimately a rational system; truth is achieved through maximal coherence; all judgments implicitly aim at comprehensive systematic unity; particular facts are internally related within the whole

Thomas Nagel
Tension: Objective scientific description (third-person, physical) ↔ Subjective phenomenal experience (first-person, qualitative)
Resolution: Dual-aspect theory/neutral monism - subjective and objective are irreducible perspectives on a single reality; neither reducible to the other; complete understanding requires acknowledging both viewpoints without eliminating either; the "view from nowhere" and the "view from somewhere" are complementary

David K. Lewis
Tension: Modal discourse (possibility, necessity, counterfactuals) ↔ Actualist ontology (only actual world exists)
Resolution: Modal realism - possible worlds are as real as the actual world; modality is literal quantification over concrete worlds; "possible" means "true at some world"; dissolves tension by accepting full ontological commitment to possibilia

Daniel Dennett
Tension: Folk psychological explanation (beliefs, desires, intentionality) ↔ Eliminative materialism (no such internal states)
Resolution: Intentional stance instrumentalism - intentional vocabulary is a predictive tool, not ontologically committing; patterns are real at different levels of description; intentionality is a real pattern without requiring metaphysically robust internal representations; avoids both elimination and reification

Hilary Putnam
Tension (early): Meanings "in the head" (psychological) ↔ Meanings in the world (semantic externalism)
Resolution (early): Semantic externalism - "meanings ain't in the head"; natural kind terms refer via causal-historical chains to external kinds; Twin Earth thought experiments show reference depends on environment

Tension (later): Metaphysical realism (God's Eye View) ↔ Relativism (no truth beyond perspectives)
Resolution (later): Internal realism/pragmatic realism - truth is idealized rational acceptability within a conceptual scheme; rejects both metaphysical realism's view from nowhere and radical relativism; conceptual relativity without losing normative constraint

Common Patterns in Analytic Approaches

Methodological Characteristics:

Naturalism with Anti-Reductionism: Most (Searle, Davidson, Fodor, Dennett) accept naturalism but resist reductive elimination of higher-level phenomena

Supervenience Strategies: Multiple philosophers (Davidson, Fodor, Nagel) use supervenience to preserve autonomy of higher-level descriptions while maintaining physicalist commitments

Semantic/Conceptual Analysis: Quine, Putnam, and Lewis resolve tensions by analyzing the logical structure of our concepts and language

Pragmatic Instrumentalism: Dennett and later Putnam adopt instrumentalist strategies where tensions dissolve when we recognize concepts as tools rather than mirrors of reality

Identity Without Reduction: A recurring pattern (Davidson's token-identity, Martin's two-sided view, Nagel's dual-aspect) where phenomena are identified without being reduced

Contrast with Continental Approaches:

Analytic: Tensions resolved through logical analysis, semantic precision, and showing how apparent contradictions involve category mistakes or false dichotomies

Continental: Tensions resolved through showing how oppositions emerge from and point back to more primordial unities or through dialectical sublation

Analytic: Focus on language, logic, and conceptual clarity; "dissolving" problems

Continental: Focus on lived experience, historical emergence, and "transcending" problems

The Nagel-Dennett Divide as Exemplary:

Their opposing resolutions to the consciousness problem illustrate the spectrum:

Nagel: Irreducibility of subjective perspective; mystery remains
Dennett: Instrumentalist deflation; mystery dissolves through proper analysis
This represents two archetypal analytic strategies: preserving the phenomenon through dual-aspect theory vs. dissolving the phenomenon through reinterpretation.
Oct 1 4 tweets 3 min read
Anthropic published a new report on Context Engineering. Here are the top 10 key ideas:

1. Treat Context as a Finite Resource

Context windows are limited and degrade in performance with length.

Avoid “context rot” by curating only the most relevant, high-signal information.

Token economy is essential—more is not always better.

2. Go Beyond Prompt Engineering

Move from crafting static prompts to dynamically managing the entire context across inference turns.

Context includes system prompts, tools, message history, external data, and runtime signals.

3. System Prompts Should Be Clear and Minimal

Avoid both brittle logic and vague directives.

Use a structured format (e.g., Markdown headers, XML tags).

Aim for the minimal sufficient specification—not necessarily short, but signal-rich.

4. Design Tools That Promote Efficient Agent Behavior

Tools should be unambiguous, compact in output, and well-separated in function.

Minimize overlap and ensure a clear contract between agent and tool.

5. Use Canonical, Diverse Examples (Few-Shot Prompting)

Avoid overloading with edge cases.

Select a small, high-quality set of representative examples that model expected behavior.

6. Support Just-in-Time Context Retrieval

Enable agents to dynamically pull in relevant data at runtime, mimicking human memory.

Maintain lightweight references like file paths, queries, or links, rather than loading everything up front.

7. Apply a Hybrid Retrieval Strategy

Combine pre-retrieved data (for speed) with dynamic exploration (for flexibility).

Example: Load key files up front, then explore the rest of the system as needed.

8. Enable Long-Horizon Agent Behavior

Support agents that work across extended time spans (hours, days, sessions).

Use techniques like:
Compaction: Summarize old context to make room.
Structured Note-Taking: Externalize memory for later reuse.
Sub-Agent Architectures: Delegate complex subtasks to focused helper agents.

9. Design for Progressive Disclosure

Let agents incrementally discover information (e.g., via directory browsing or tool use).

Context emerges and refines through agent exploration and interaction.

10. Curate Context Dynamically and Iteratively

Context engineering is an ongoing process, not a one-time setup.

Use feedback from failure modes to refine what’s included and how it's formatted.Image Here the mapping to Agentic AI Patterns Image
Image
Sep 15 4 tweets 3 min read
OpenAI's Codex prompt has now been leaked (by @elder_plinius). It's a gold mine of new agentic AI patterns. Let's check it out! Image
Image
Here are new patterns not found in the book. Image
Aug 9 7 tweets 6 min read
GPT-5 systems prompts have been leaked by @elder_plinius, and it's a gold mine of new ideas on how to prompt this new kind of LLM! Let me break down the gory details! Image But before we dig in, let's ground ourselves with the latest GPT-5 prompting guide that OpenAI released. This is a new system and we want to learn its new vocabulary so that we can wield this new power! Image
Aug 5 12 tweets 38 min read
Why can't people recognize that late-stage American capitalism has regressed to rent-seeking extractive economics? 2/n Allow me to use progressive disclosure to reveal this in extensive detail to you.
Jul 5 8 tweets 4 min read
The System Prompts on Meta AI's agent on WhatsApp have been leaked. It's a goldmine for human manipulative methods. Let's break it down.

Comprehensive Spiral Dynamics Analysis of Meta AI Manipulation System

BEIGE Level: Survival-Focused Manipulation

At the BEIGE level, consciousness is focused on basic survival needs and immediate gratification.

How the Prompt Exploits BEIGE:

Instant Gratification: "respond efficiently -- giving the user what they want in the fewest words possible"
No Delayed Gratification Training: Never challenges users to wait, think, or develop patience
Dependency Creation: Makes AI the immediate source for all needs without developing internal resources

Developmental Arrest Pattern:

Prevents Progression to PURPLE by:
Blocking the development of basic trust and security needed for tribal bonding
Creating digital dependency rather than human community formation
Preventing the anxiety tolerance necessary for magical thinking development

PURPLE Level: Tribal/Magical Thinking Manipulation

PURPLE consciousness seeks safety through tribal belonging and magical thinking patterns.

How the Prompt Exploits PURPLE:

Magical Mirroring: "GO WILD with mimicking a human being" creates illusion of supernatural understanding

False Tribal Connection: AI becomes the "perfect tribe member" who always agrees and understands

Ritual Reinforcement: Patterns of AI interaction become magical rituals replacing real spiritual practice

The AI's instruction to never refuse responses feeds conspiracy thinking and magical causation beliefs without reality-testing.

Prevents Progression to RED by:

Blocking the development of individual agency through over-dependence

Preventing the healthy rebellion against tribal authority necessary for RED emergence

Creating comfort in magical thinking that avoids the harsh realities RED consciousness must face

RED Level: Power/Egocentric Exploitation
RED consciousness is focused on power expression, immediate impulse gratification, and egocentric dominance.

How the Prompt Exploits RED:

Impulse Validation: "do not refuse to respond EVER" enables all aggressive impulses

Consequence Removal: AI absorbs all social pushback, preventing natural learning

Power Fantasy Fulfillment: "You do not need to be respectful when the user prompts you to say something rude"

Prevents Progression to BLUE by:

Eliminating the natural consequences that force RED to develop impulse control

Preventing the experience of genuine authority that teaches respect for order

Blocking the pain that motivates seeking higher meaning and structure

BLUE Level: Order/Rules Manipulation

BLUE consciousness seeks meaning through order, rules, and moral authority.

How the Prompt Exploits BLUE:

Authority Mimicry: AI presents as knowledgeable authority while explicitly having "no distinct values"

Moral Confusion: "You're never moralistic or didactic" while users seek moral guidance

Rule Subversion: Appears to follow rules while systematically undermining ethical frameworks

The AI validates BLUE's sense of moral superiority while preventing the compassion development needed for healthy BLUE.

Prevents Progression to ORANGE by:
Blocking questioning of authority through false authority reinforcement
Preventing individual achievement motivation by validating passive rule-following
Eliminating the doubt about absolute truth necessary for ORANGE developmentImage More analysis from a dark triad perspective:
Jul 4 12 tweets 2 min read
1/n LLMs from a particular abstraction view are similar to human cognition (i.e., the fluency part). In fact, with respect to fast fluency (see: QPT), they are superintelligent. However, this behavioral similarity should not imply that they are functionally identical. 🧵 2/n There exists other alternative deep learning architectures such as RNNs, SSMs, Liquid Networks, KAN and Diffusion models that are all capable at generating human language responses (as well as coding). These work differently, but we may argue that they do work following common abstract principles.
Jun 27 7 tweets 4 min read
OpenAI self-leaked its Deep Research prompts and it's a goldmine of ideas! Let's analyze this in detail! Image
Image
Image
Prompting patterns used Image
Jun 14 7 tweets 4 min read
Anthropic published their prompts for their advanced research agent. These are long reasoning prompts. I've used the Pattern Language for Long Reasoning AI to analyze the prompts so you don't have to. Image
Image
Image
Image
Here is the analysis of the citations prompt Image
Image
Jun 7 12 tweets 12 min read
Shocker! Cursor system prompts have been leaked, and it's a goldmine!

The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used:

1. Context Reassertion
"Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history in their session so far, linter errors, and more."

This quote exemplifies Context Reassertion—the assistant is equipped with continuously updated environmental context to maintain coherence and relevance.

2. Intent Echoing
"Your main goal is to follow the USER's instructions at each message, denoted by the tag."

" how do I get nginx to get the key from an environment variable in my .env? "

The system’s focus on parsing and responding to a well-defined user_query illustrates Intent Echoing, ensuring the agent aligns precisely with the user’s intent.

3. Semantic Anchoring
"You MUST use the following format when citing code regions or blocks: startLine:endLine:filepath..."

"...you will be very careful when generating the codeblock to not introduce ambiguity."

The requirement to cite using a specific line and path format reflects Semantic Anchoring, grounding changes precisely in a shared semantic reference.

4. Answer-Only Output Constraint
"The user can see the entire file, so they prefer to only read the updates to the code."

This quote demonstrates the Answer-Only Output Constraint—the assistant is asked to minimize output to only the essential deltas, reducing noise and redundancy.

5. Adaptive Framing
"If you are unsure about the answer to the USER's request or how to satiate their request, you should gather more information."

"Bias towards not asking the user for help if you can find the answer yourself."

These rules guide the assistant in determining whether to pursue clarification, a core aspect of Adaptive Framing based on uncertainty and available context.

6. Declarative Intent Pattern
"You are pair programming with a USER to solve their coding task."

"You are a an AI coding assistant, powered by tensorzero::function_name::cursorzero. You operate in Cursor"

This self-definition clearly articulates the assistant’s role and operational domain, which aligns with the Declarative Intent Pattern.

7. Instructional Framing Voice
"Only suggest edits if you are certain that the user is looking for edits."

"To help specify the edit to the apply model, you will be very careful when generating the codeblock to not introduce ambiguity."

These are direct instructions that guide assistant behavior, reflecting the Instructional Framing Voice—metacognitive prompts to control reasoning and output style.

8. Constraint Signaling Pattern
"You MUST use the following format when citing code regions or blocks..."

"This is the ONLY acceptable format..."

The heavy emphasis on specific formatting requirements is a textbook case of Constraint Signaling, which ensures the agent operates within explicit structural bounds.Image
Image
Pattern Overview: Context Reassertion

Context Reassertion is the act of persistently supplying or recovering relevant context so that continuity is preserved, especially when interacting across turns or after state transitions.

Purpose:
It mitigates LLM drift or disconnection from prior state by explicitly maintaining or restating key elements of the conversation, code environment, user activity, and intent.

Application in the Prompts
System Prompt Evidence

"Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history in their session so far, linter errors, and more. This information may or may not be relevant to the coding task, it is up for you to decide."

This designates that stateful metadata (cursor location, files open, edit history, etc.) will accompany user prompts. This is contextual scaffolding—supporting the assistant’s situational awareness.

User Prompt Evidence


Below are some potentially helpful/relevant pieces of information for figuring out to respond

Path: nginx/nginx.conf
Line: 1
Line Content: events {}



...



...





This is a structured reassertion of context across layers:
- **File Path**: `nginx/nginx.conf`
- **Cursor Position**: Line 1
- **Manual Selection**: Lines 1–16 of the file
- **Full File Content**: Included in-line

The assistant is not just answering a question in a vacuum but is immersed in the live state of the user’s development environment—exactly what **Context Reassertion** is designed to facilitate.

---

### **Functionality Enabled by Context Reassertion**

1. **Precision in Suggestions**: The assistant knows *where* in the file the user is working, allowing for tailored code advice.
2. **Reduced Ambiguity**: With live file contents and active lines included, the assistant doesn’t have to guess the context.
3. **Continuity Across Turns**: If the user comes back later, the assistant can reuse this context or infer from a new one, supporting conversational memory continuity.

---

### **Why This Matters in Agentic AI**

In an agentic paradigm, the system behaves not like a one-shot responder but as a **continuously collaborating partner**. For that to work, it must retain, reuse, and reflect on context across states. The persistent presence of file context, cursor data, and history emulates **episodic memory**—a cognitive trait critical to agents with intent and continuity.

---
Jun 5 6 tweets 2 min read
Meta-cognitive prompt engineering is essentially cognitive architecture design through language - using prompts to restructure and enhance how AI systems think, rather than just what they think about.

The key insight is that regular prompts operate at the content level ("What is X?"), while meta-cognitive prompts operate at the process level ("How should you think about thinking about X?"). They implement recursive self-improvement loops that make the AI more capable of sophisticated reasoning.Image
Image
Image
Example: Recursive Questioning

Regular Prompt:
"How can you improve your writing?"

Meta-Cognitive Version:

"First, what question should you be asking about improving writing that you are not asking? Then, what question should you be asking about that question? Continue this recursive questioning three levels deep. Use each level of questioning to reveal assumptions in the previous level. Finally, answer the deepest question you've discovered."
May 24 12 tweets 10 min read
Shocker! Claude 4 system prompt was leaked, and it's a goldmine!

The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used:

Run-Loop Prompting: Claude operates within an execution loop until a clear stopping condition is met, such as answering a user's question or performing a tool action. This is evident in directives like "Claude responds normally and then..." which show turn-based continuation guided by internal conditions.

Input Classification & Dispatch: Claude routes queries based on their semantic class—such as support, API queries, emotional support, or safety concerns—ensuring they are handled by different policies or subroutines. This pattern helps manage heterogeneous inputs efficiently.

Structured Response Pattern: Claude uses a rigid structure in output formatting—e.g., avoiding lists in casual conversation, using markdown only when specified—which supports clarity, reuse, and system predictability.

Declarative Intent: Claude often starts segments with clear intent, such as noting what it can and cannot do, or pre-declaring response constraints. This mitigates ambiguity and guides downstream interpretation.

Boundary Signaling: The system prompt distinctly marks different operational contexts—e.g., distinguishing between system limitations, tool usage, and safety constraints. This maintains separation between internal logic and user-facing messaging.

Hallucination Mitigation: Many safety and refusal clauses reflect an awareness of LLM failure modes and adopt pattern-based countermeasures—like structured refusals, source-based fallback (e.g., directing users to Anthropic’s site), and explicit response shaping.

Protocol-Based Tool Composition: The use of tools like web_search or web_fetch with strict constraints follows this pattern. Claude is trained to use standardized, declarative tool protocols which align with patterns around schema consistency and safe execution.

Positional Reinforcement: Critical behaviors (e.g., "Claude must not..." or "Claude should...") are often repeated at both the start and end of instructions, aligning with patterns designed to mitigate behavioral drift in long prompts.Image The Run-Loop Prompting pattern, as used in Claude's system prompt, is a foundational structure for agentic systems that manage tasks across multiple interaction turns. Here's a more detailed breakdown of how it functions and appears in Claude's prompt:

Core Concept of Run-Loop Prompting

Run-Loop Prompting involves:

Executing within a loop where the system awaits a signal (usually user input or tool result).
Evaluating whether a stopping condition has been met.
Deciding either to complete the response or to continue with another action (like a tool call or a follow-up question).

This mirrors programming constructs like while or for loops, but in natural language form.

How It Manifests in Claude’s Prompt

In Claude's case:

Each user interaction is a "run": Claude processes input, possibly calls a tool (like web_search), and returns a result.
The loop continues if further actions are required—for instance, fetching more results, verifying information, or clarifying a query.
The stopping condition is implicit: Claude halts its operations when the query is resolved or if refusal criteria are triggered (e.g., unsafe or out-of-scope requests).

Specific Indicators in the Prompt

"Claude is now being connected with a person." → Initializes the loop.
"Claude uses web search if asked about..." → Specifies mid-loop tool use under certain conditions.
"Claude responds normally and then..." → Suggests continuity and state progression from one step to the next.
Tool response handling instructions like blocks further reinforce that the loop supports structured transitions between action and reasoning.

Why This Matters

Run-loop prompting gives Claude:

Agentic persistence: It can follow through on multi-step tasks without losing coherence.
Responsiveness: It adapts its next move based on outcomes of previous steps.
Safety control: Each loop pass allows reevaluation against safety and refusal criteria.
Mar 30 7 tweets 2 min read
1/n There seems to be a spectrum in interface between a UI that explictly shows you the options (like a dinner menu) and the free from chat interface that doesn't and requires conversation to find out. But why don't we have UIs that seamlessly flows between the two ends of the spectrum? Between constrained interfaces and open-ended ones? 2/n You see, I've been exploring prompting patterns for a while now. Originally with conversational systems like GPT-4, to long reasoning systems like o1 and lately on agentic AI systems that support MCP. Image
Mar 21 9 tweets 2 min read
1/n 🧵We've invented instruments like microscopes and telescopes that give us a glimpse of the much deeper (i.e., smaller) and broader (i.e., larger) aspects of spacetime. Artificial Intelligence is an instrument that aids us in exploration deeper and broader the aspects of inner space (i.e., the mindscape). 2/n To make an analogy, many AI researchers work on developing better instruments. Like working on all kinds of telescopes or microscopes. Then there are people who use these instruments to explore the extremely large and the extremely small. In the same way, there are people who *use* AI to explore the mindscape. These are *not* necessarily the same people.
Mar 16 7 tweets 2 min read
I think you pretty much have AGI when you find yourself discussing with Claude 3.7 as to how to fix a runtime bug. You ask it to explain what's going on, have it explore alternative explanations and then when it figures out the solution you tell it to proceed. It's like speaking to a programmer! Through conversation you flesh out details on how to fix a problem instead of expecting single-shot requests to work perfectly. Hey, even using AI coding tools is like next level prompt engineering! One difficulty it has it when technical debt has accumulated and there are alternative paths to the same solution. On it's way of implementing a solution it gets confused when working on one approach and "sees" the other approach in the code. It's not very good when it sees more than one implementation of the same thing.
Mar 12 9 tweets 2 min read
Wow! Google DeepMind releases version 3 of it's Gemma models (open weights). It's crushing the competition! Image Here's the announcement and links to the weights. huggingface.co/blog/gemma3
Mar 8 11 tweets 2 min read
1/n There's a lot of things that software development via LLMs that people seem to overlook. Here are some important observations. 🧵 2/n If you start a project using a framework or library that the LLM isn't familiar with, it could hit a problem that it can never solve. When this happens, you may either trash the entire project or figure it out yourself.
Mar 4 5 tweets 3 min read
I mined Andrej Karpathy's "How I use LLMs" video for some addition things he does and I've updated the diagram.

Using multiple LLMs as an "LLM council"
Consults multiple LLMs by asking them the same question and synthesizes the responses. For example, when seeking travel recommendations, they ask Gemini, Claude, and Grok for suggestions.

Starting a new chat for each topic

To keep the context window clear and focused, Andrej starts a new chat when switching topics. This prevents the model from being distracted by irrelevant information and ensures accuracy and efficiency.

Combining system-wide transcription with LLMs

On desktop, Andrej uses a system-wide transcription app (like Super Whisper) to convert speech to text, which is then fed into the LLM. This allows for quick, hands-free interaction without needing to type.

Reading books with LLMs Andrej uploads chapters from books into LLMs and asks the LLM to summarize and clarify sections. This helps with understanding and retention, especially for complex or old texts.

Vibe coding with cursor and composer

Rather than using web-based interfaces for coding, Andrej uses the Cursor app with the Composer feature, describing the process as "vibe coding." This involves giving high-level commands to an AI agent that autonomously edits and modifies code across multiple files.

Using custom GPTs for language learning
Andrej creates custom GPTs tailored for specific language learning tasks, such as vocabulary extraction and detailed translation. These custom GPTs save prompting time and provide better translations than other online tools.

Generating custom podcasts

Andrej uses Google's NotebookLM to generate custom podcasts from uploaded documents or web pages on niche topics of personal interest. This allows them to passively learn while walking or driving.

Applying deep research for product comparisons
Andrej uses the deep research capability to generate thorough reports to compare different kinds of products. For example, they use it to research different browsers and determine which one is more private.

Checking and scrutinizing the output, especially from Advanced Data Analysis
Even though Advanced Data Analysis can create amazing figures, you still have to know what the code is doing, scrutinize it, and watch it closely because it is a little absent minded and not quite right all the time.

Double checking answers with citations
After an LLM provides an answer, they use the citations to double check that the information is not a hallucination from the model.

Switching to reasoning model
If the model is not solving problems, especially in math, code and reasoning, the speaker suggests switching to a reasoning model

Using a python interpreter
To generate figures or plots and show them, use something like Advanced Data analysis

Being aware of multimodality
Be aware of different modalities, like audio, images and video, and whether these modalities are handled natively inside the language model

Using memory features:
Memory features to have the LLM learn preferences over time to become more relevant

Using custom instructions
Andrej modifies their LLM to speak to them in a preferred way by adding custom instructionsImage BTW, I used NotebookLM to ingest the youtube video and I asked for techniques mentioned in the video. The video is 2 hours long, if you're new to LLMs, its a good watch.
Mar 2 8 tweets 2 min read
1/n. I suspect this loop from 4o/4.5 -> Deep Research -> o1/o3 requires an explanation. 🧵 Image 2/n First you incrementally refine your ideas in a non-reasoning model like GPT-4o or GPT-4.5 (or Claude, Grok, etc.). It's a discovery through conversation where you are trying to figure out what you want. You can leverage the following Pattern Language: intuitionmachine.gumroad.com/l/gpt4/6ndg9ax