Muratcan Koylan Profile picture
Context Engineer, Research @ https://t.co/ytKLwdts2F - ex AI Agent Systems Manager 99Ravens - HCI & Marketing background - Builder, creator, optimist and open source AI dev
Dec 27, 2025 4 tweets 3 min read
Two years ago, skeptics said AI images could be easily identified because it couldn't generate hands. Now, it's impossible.

The same is happening in AI Writing.

Fine-tuning on specific author datasets led experts to prefer AI over human writing.

This paper has three interesting insights;

1. Fine-tuned GPT-4o was ~8x more likely to be chosen as "authentic" than an expert writer.

2. Pangram (probably the best AI detector) flagged only 3% of SFT outputs vs 97% of in-context prompting.

3. How simple it is to create a fine-tuning dataset by reverse engineering books.

They purchased legal ePub files of the complete bibliographies for 30 living authors and split (by double-newlines (paragraphs), if a chunk was still too long, they used GPT-4o to grammatically split it further without deleting content) the full books into chunks of 250–650 words.

They used the same model to generate the Instruction dataset: "Describe in detail what is happening in this excerpt. Mention the characters and whether the voice is in first or third person for majority of the excerpt. Maintain the order of sentences while describing."

And formatted the data into the final pairs:
Input
"Write a [Word Count] word excerpt about the content below emulating the style and voice of [Author Name][Content Description generated by GPT-4o in Step 2]"

Output
The original raw text excerpt from the book.

---

Base LLMs are RLHF-tuned to be safe and predictable so they generate cliches. Fine-tuning on high-quality literature "unlearned" this behaviour.Image Skillifying everything.

If this works, I'll share the results tonight. Image
Dec 24, 2025 5 tweets 4 min read
This is a tacit knowledge problem, reframed as an infrastructure opportunity.

All of the current AI implementations miss the reasoning behind decisions, the "why" that lives in experts' heads.

You cannot "invent" a digital expert persona using just prompt engineering. You have to extract the expert. The reasoning is the same.

When a senior strategist decides to approve a campaign, they use:
- Pattern recognition from similar cases
- Organizational context about relationships and history
- Judgment heuristics they couldn't articulate if you asked them directly

None of this is in the CRM. It's tribal knowledge passed through onboarding and side conversations. "exception logic that lives in people's heads."

The goal is to extract tacit knowledge before it's needed, turning implicit reasoning into explicit, reusable structure.

But Gupta's framing adds a second dimension I hadn't fully articulated: you also need to capture decision traces as they happen.
- Extraction gives you the knowledge.
- Traces give you the application of that knowledge across specific cases.

This connects directly to why I think Agent Skills matter more than people realize.

Agent skills are essentially structured knowledge that tells an agent how to operate, what context to gather, what frameworks to apply, when to escalate, how to reason about edge cases.

If skills capture how to decide, decision traces capture what was decided and why. Over time, those traces become a searchable precedent. The agent doesn't just follow rules, it can query what happened in similar cases, and start thinking like the embodied persona.

This is the difference between an agent that executes and an agent that learns.

The broader implication is for context engineering itself but the purpose is different. Context graphs are about organizational memory that compounds over time.Image - Tacit knowledge extraction captures what experts know before it's needed
- Agent skills encode how to apply that knowledge, structured reasoning frameworks
- Decision traces persist what actually happened, searchable precedent
- Context engineering manages all of this, both for the current window and for organizational memory

x.com/koylanai/statu…
Dec 21, 2025 4 tweets 3 min read
I’m excited to share a new repo: Agent Skills for Context Engineering

Instead of just offering a library of black-box tools, it acts as a "Meta-Agent" knowledge base. It provides a standard set of skills, written in markdown and code, that you can feed to an agent so it understands how to manage its own cognitive resources.

github.com/muratcankoylan…

Most agent failures are not model failures; they are context failures. This is still an experimental project. The goal is to establish a platform-agnostic standard for context engineering that can be used in Cursor, Claude Code, Copilot or Codex.

skills/
context-fundamentals: What context is, why it matters
context-degradation: How context fails (lost-in-middle, poisoning)
context-optimization: Compaction, masking, caching
multi-agent-patterns: Orchestrator, swarm, hierarchical
memory-systems: Vector RAG, knowledge graphs, Zep
tool-design: Building tools agents can use
evaluation: Testing and measuring agent systems

I believe this is a good start, showing developers how to approach context engineering rather than relying on ready-made tools.

You will also find the aggregated research documents I used to build these skills in the repo. The skills are synthesized from technical blogs on context and prompt engineering that I bookmarked, AI Labs' documentations, and Anthropic Skills examples.

Try the 7 Skills, created using Antrhopic's Skills template format. Experiment with the provided scripts and references, and feel free to contribute to the repo.Image Most of the reference documents I used are from these or similar context engineering learnings.
Dec 5, 2025 4 tweets 3 min read
Your best people can't document their expertise because they don't know what they know until they're asked.

We built an interviewer that achieves peer status, so experts reveal the judgment patterns they'd only share with a colleague.

I wrote a blog about how we architected the multi-agent system behind this, how we extract expert thinking, and build digital personas that feel like talking to a peer.

99ravens.agency/resources/blog…Image
Image
Congrats on the launch; @saffronhuang @AmandaAskell @alexalbert__ @mikeyk 👋

I've been working on the interviewer agent system and the prompt & context details for some time. We're happy to test your version and share our learnings.
Nov 6, 2025 4 tweets 2 min read
Kimi K2 Thinking has a genuine literary intelligence.

Creative writing, taste, structural ambition, metaphorical control, restraint under extreme constraints...

This model actually accomplishes nearly impossible writing tasks. Image This is incredible, flawless. Image
Sep 30, 2024 5 tweets 5 min read
Comparison of 3 advanced prompting techniques:
(Logic & Graph & Tree)-of-Thought

Tree-of-Thought (ToT)
ToT enhances the basic Chain-of-Thought (CoT) prompting by structuring the reasoning process as a tree.

Each node in this tree symbolizes a "thought" representing a partial solution to the problem.

ToT enables LLMs to explore multiple reasoning paths, systematically evaluate their progress, and backtrack from unpromising branches.

ToT's strengths lie in its capacity to move beyond linear reasoning. It offers a more flexible and exploratory approach to problem-solving.

For example, you want an LLM to solve the following problem:
"If John is taller than Mary, and Mary is taller than Sarah, who is the shortest?"

Using ToT, the LLM might generate a tree-like structure:
Thought 1 (Root): Who is the shortest?
Thought 2 (Branch from Thought 1): John is taller than Mary.
Thought 3 (Branch from Thought 1): Mary is taller than Sarah.
Thought 4 (Branch from Thought 2 & 3): If John is taller than Mary, and Mary is taller than Sarah, then Sarah is the shortest.

This tree structure allows the LLM to break down the problem into smaller, more manageable steps and explore different reasoning paths to arrive at the solution.

Graph-of-Thought (GoT)
GoT employs a graph structure, a more flexible and expressive representation compared to a tree.

In a GoT, vertices represent individual "LLM thoughts" - units of information generated by the LLM, and edges signify dependencies between these thoughts.

This structure enables complex thought manipulations, such as "aggregation," where the system can combine multiple promising thoughts into a new, potentially superior thought.

GoT's key strength is its enhanced flexibility in representing relationships between thoughts.

The research paper demonstrate that this flexibility, combined with thought transformations like aggregation, leads to significant performance gains over ToT in specific tasks, such as sorting.

Imagine you're using an LLM for a creative writing task, and the prompt is "Write a short story about a time traveler."

Using GoT, the LLM might generate different "thought" nodes like "time traveler," "ancient civilization," "paradox," "love story," etc.

The LLM can then establish edges between these nodes based on their relationships (e.g., the time traveler visits the ancient civilization, leading to a paradox and a love story).

This graph structure allows for a more flexible and interconnected representation of thoughts, enabling the LLM to explore different creative pathways and potentially generate a more nuanced and engaging story.

Logic-of-Thought (LoT)
The most recent prompting technique, LoT, tackles a significant challenge faced by neuro-symbolic reasoning methods: information loss during the conversion of natural language into logical expressions.

LoT aims to mitigate this loss by augmenting the original prompts with expanded logical information derived from the input context.

This is achieved through a three-phase process:

Logic Extraction: The LLM extracts propositions and logical relationships from the input context, forming logical expressions.

Logic Extension: The extracted logical expressions are expanded using predefined logical rules.

Logic Translation: The expanded logical expressions are converted back into natural language descriptions, enriching the original prompt with this additional logical information.

LoT's primary strength lies in its ability to bridge the gap between symbolic logic and natural language, reducing information loss and enhancing the LLM's reasoning accuracy.

It is designed to be compatible with existing prompting techniques, allowing for seamless integration with methods like CoT, ToT, and potentially GoT.

The effectiveness of LoT heavily depends on the accuracy of the logic extraction phase. LLMs, while powerful, are still prone to errors in understanding and representing complex logical relationships, which can lead to inaccurate logical expressions and, consequently, flawed reasoning.

For example,
1- Increased immigration diversifies the workforce.
2- When a workforce is more diverse, it becomes more innovative.
3- If a country is more innovative, its economy grows faster.
4- Country X allowed 1000 skilled workers to immigrate last year.
5- If a country increases immigration, that country's economy grows faster.

Whether this inference is correct:
Country X's economy is growing faster than before.

This expanded logical information would be integrated back into the original prompt, making it easier for the LLM to infer the correct answer: "Correct.".

While all three techniques aim to enhance LLM reasoning, their core focuses differ.

ToT primarily targets exploring multiple reasoning pathways.

GoT emphasizes the flexible representation and manipulation of thoughts using a graph structure.

LoT addresses the information loss problem in logical reasoning tasks by enriching the prompt with expanded logical information.

ToT and GoT primarily focus on the structural organization of thoughts.

In contrast, LoT emphasizes augmenting the semantic content of the prompt with explicit logical information.

Despite their differences, these techniques are not mutually exclusive.

Integrating LoT (e.g., logic enrichment) with the structural frameworks of ToT or GoT, potentially leading to even more powerful and accurate reasoning capabilities (~8% increase according to the Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models paper).Image
Image
Image
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
arxiv.org/abs/2308.09687
Jul 7, 2024 6 tweets 3 min read
I replaced a $2000/month predictive analysis software with Julius AI. In this video, I'll show you how to create a complex B2B marketing dataset using GPT. Then, I'll identify companies with a high likelihood of churn, determining their reasons for doing so. Julius AI achieves this using a chain of deterministic actions to leverage various machine learning models.

This tool is incredibly powerful, and I'm excited to share my process with you. Whether you do data analysis or not, I highly recommend Rahul's conversation with @trychroma here to understand how @JuliusAI_ works.

@0interestrates

Mar 5, 2024 13 tweets 11 min read
Anthrophic has a very detailed prompting cookbook.

If you're using the world's most powerful LLM, Anthropic's Claude-3, with the same prompts as OpenAI's GPT models, you're not achieving its full potential. Their in-depth prompting cookbook will help you unlock incredible results.

Here's what you need to know: Yesterday, we woke up to a new era.

Anthropic released the world’s most powerful large language model (LLM), Claude 3.

This release introduced a family of three models: Claude-3 Haiku, Claude-3 Sonnet, and Claude-3 Opus.

> Claude 3 Opus surpasses GPT-4 in common benchmarks like MMLU and HumanEval.
> It boasts strong capabilities in analysis, forecasting, content creation, code generation, and languages such as Spanish, Japanese, and French.
> The models support a 200K context window, expandable to 1M tokens for select customers.
> They have robust vision capabilities for processing images, charts, graphs, and diagrams.
> Anthropic claims that Claude 3 has a more nuanced understanding of requests and exhibits fewer refusals compared to previous models.
> Claude 3 Opus achieved around 60% accuracy on GPQA, a challenging dataset created by domain experts.
> The training of Claude 3 included synthetic data generated by AI language models, broadening its training data scope.
> Claude 3 has improved visual capabilities, offering new use cases in industries like legal, finance, healthcare, and customer service.
> The model demonstrates self-awareness and the ability to recognize when it is being evaluated or tested.

If you're like me, following almost every new LLM release and starting to become slightly disappointed with each release that claims to have surpassed GPT-4 but doesn't live up to the hype upon testing, you might wonder if we're nearing the peak of AI hype.

However, Dario Amodei's recent work proves otherwise.

Claude 3's potential is clear.

Now, let's unlock that power with the perfect prompts by revealing must-know techniques and a 'cookbook' of ready-to-use examples.
Sep 26, 2023 25 tweets 7 min read
ChatGPT image recognition is here and it is magical! Image This image is still a mystery... Image
Sep 25, 2023 16 tweets 6 min read
Automate ChatGPT to create highly detailed Midjourney / Dall:E / StabilityAI prompts.

Without any Plugin!

Try MidjourneyQuest, 10-step creative journey🧵: Image I've created over 1,000 images with Midjourney, not only for art purposes but also for professional use, including blog & ebook images, social media posts and much more.

What I've noticed is that if you want to bring what's in your mind to life, you should be knowledgeable in:
Apr 29, 2023 7 tweets 3 min read
🔻My First GPT-4 Project 🔻

I can chat with thousands of pages fast and affordably.

How? Let's try with the latest @wef report (2023 April)

Built with @OpenAI, @pinecone and @LangChainAI

1. Upload the PDF and divide its text into chunks.
2. Store the embeddings in your… twitter.com/i/web/status/1… A big shoutout to @mayowaoshin for providing excellent support to the Open-source AI community!
Apr 14, 2023 4 tweets 1 min read
🤯AI can "see" your thoughts! What's next?! 🧠👀

Osaka University researchers decode brain activity, sparking privacy debates!

How will this power be used? 🚨👇 Image Researchers have used a deep learning AI model #StableDiffusion to decode brain activity, generating images of what test subjects were seeing while inside an MRI.

This breakthrough does not represent mind-reading, as it can only produce images a person has viewed.
For now 🙃