160+ pages covering the most important research around context engineering for LLMs.
This is a must-read!
Here are my notes:
The paper provides a taxonomy of context engineering in LLMs categorized into foundational components, system implementations, evaluation methodologies, and future directions.
The context engineering evolution timeline from 2020 to 2025 involves foundational RAG systems to complex multi-agent architectures.
The work distinguishes prompt engineering from context engineering on dimensions like state, scalability, error analysis, complexity, etc.
Context engineering components include context retrieval and generation, context processing, context management, and how they are all integrated into systems implementation, such as RAG, memory architectures, tool-integrated reasoning, and multi-agent coordination mechanisms.
One important aspect of context processing is contextual self-refinement, which aims to improve outputs through cyclical feedback mechanisms.
An important aspect of context management is how to deal efficiently with long context and reasoning chains. The paper provides an overview of and characteristics of key methods for long-chain reasoning.
Memory is key to building complex agentic systems that can adapt, learn, and perform coherent long-term tasks.
There is also a nice overview of different memory implementation patterns.
Tool-calling capabilities in an area of continuous development in the space. The paper provides an overview of tool-augmented language model architectures and how they compare across tool categories.
Context engineering is going to evolve rapidly.
But this is a great overview to better map and keep track of this rapidly evolving landscape.
There is a lot more in the paper. Over 1000+ references included.
This survey tries to capture the most common methods and biggest trends, but there is more on the horizon as models continue to improve in capability and new agent architectures emerge.
The spec-init slash command prompt, if you want to try it:
"Your task is to first help me build a spec for my new project in ARGUMENT.
Use the AskUserQuestion Tool to help build the spec in ARGUMENT by interviewing me and gathering requirements and details about the project implementation, UI & UX, tech stack, concerns, tradeoffs, etc.
Make sure questions are not obvious and probe deeper into the underlying needs and constraints.
Interview me continually and systematically until the spec is complete. Document all responses and insights to create a comprehensive and well-structured specification that serves as the foundation for the project."
Just built a new skill in Claude Code using Opus 4.5.
The skill uses Gemini 3 Pro (via API) for designing web pages.
Look at what it generated from one simple prompt.
If you have been designing websites with Claude Code, you already know how generic they turn out.
So I built a skill that uses Gemini 3 Pro to lead creative direction and generate designs. It is extremely good at this.
Opus 4.5 then integrates all that into our app.
The prompt I used: "I want to design the landing page for a new AI game. We want it to be futuristic and all that, and use animations as much as possible."
I will test with some other prompts and see how far I can push this. But the results are very exciting already.
This is one of the most insane things Nano Banana Pro 🍌 can do.
It can reproduce figures with mind-blowing precision.
No competition in this regard!
Prompt: "Please reproduce this chart in high quality and fidelity and offer annotated labels to better understand it."
When I tried this for the first time, I didn't expect that this was possible.
The level of understanding this requires is what's remarkable about it all.
The levels of personalization this unlocks are also impressive.
"Can you convert it into a cartoonish version?"
Just look at this 🤯
"Can you create a delightful cartoonish version of this table. And please put cute colors and icons along with interesting annotations to make it more readable."