What is context engineering and how is it different from prompt engineering?

Context engineering is the practice of designing, compressing, and structuring the entire context window before each LLM call. Unlike prompt engineering which focuses on crafting individual prompts, context engineering manages the full lifecycle of information flowing through the context — including history compression, tool pruning, memory injection, and multi-provider format adaptation.

What is context rot in AI agents?

Context rot is a term coined by Anthropic describing how a model's ability to accurately recall information degrades systematically as token count grows — well before the context window limit is reached. Early constraints get buried under newer messages, and the model starts drifting from its intended behavior.

What is ContextChef and how does it differ from agent frameworks?

ContextChef is a TypeScript context compiler library, not an agent framework. It doesn't take over control flow or decide when to call models. It intervenes at exactly one moment — when you've assembled all raw state (history, tool list, task state) and are about to make an LLM call — and compiles that state into an optimized payload for the target provider, following the principle of mechanism over policy.

What is the Sandwich Model in ContextChef?

The Sandwich Model is ContextChef's solution to the tension between KV-cache stability (static content first) and recency bias (dynamic state near the end). It separates context into three layers: a static Top Layer (system prompt, cache anchor), Rolling History (compressed conversation), and Dynamic State (injected into the last user message for maximum attention). This structure satisfies both goals simultaneously.

ContextChef (1): Why "Compile" Your Context

5 Mar 2026

8 min read

AI Agent ContextChef Context Engineering

中文版：ContextChef (1)：为什么要”编译上下文”

Your agent runs flawlessly for 30 turns. On turn 51 it forgets a constraint the user stated 30 messages ago and starts producing clearly wrong results. You raise the temperature, switch models, rewrite the prompt. The problem comes and goes.

It’s not the model. It’s the context.

The Industry Arrived at the Same Place

Around 2025, several serious agent builders published technical post-mortems. The topic converged on context management with striking consistency.

Manus wrote in Context Engineering for AI Agents: Lessons from Building Manus that they rewrote their agent framework four times before finding their current local optimum. Their core finding: KV-cache hit rate is the single most important metric for a production agent — more important than model capability — because a cache miss means 10× the cost and double the latency. Anthropic formally named this discipline context engineering in Effective Context Engineering for AI Agents, distinguishing it from prompt engineering, and introduced the concept of context rot: as token count grows, the model’s ability to accurately recall information degrades systematically — well before the context window is full. Letta (the successor to MemGPT) uses an OS analogy to decompose the context window into kernel context and user context — the former being managed structures like system prompts, memory blocks, and tool schemas; the latter being the flowing message buffer.

Different starting points, same underlying judgment: context isn’t something you stuff in — it needs to be designed, compressed, and structured. Before every LLM call, there’s an orchestration layer that needs to happen. Right now that layer lives in each project’s own glue code.

ContextChef is an attempt to distill those engineering practices into a reusable TypeScript compilation pipeline.

Four Problems That Keep Coming Back

The glue code varies by project, but the problems it solves are remarkably consistent:

Conversations grow too long and the model forgets. A 128k context window sounds large, but a tool-heavy agent can fill it in 20 minutes. Worse is context rot — attention dilutes in very long contexts, early constraints get “buried,” and the model starts drifting.

Too many tools and the model hallucinates. Fifty tool schemas inject roughly 5,000 tokens. More importantly, semantically similar tools compete for the model’s attention, leading to wrong calls or fabricated parameter structures. As Anthropic’s team put it: “If a human engineer can’t definitively say which tool should be used in a given situation, an AI agent can’t be expected to do better.”

Switching providers means rewriting prompts. Anthropic supports prefill and cache breakpoints; OpenAI doesn’t. Gemini’s tool call format is entirely different. A carefully tuned prompt architecture has to be rebuilt from scratch when you switch.

Long-running tasks drift off course. The system prompt is static; task state is dynamic. By step 8, the model may have forgotten the constraints established in step 1. Manus’s solution is to rewrite a todo.md at every step — using recitation to pull the goal back into the model’s recent attention span.

A Compiler, Not a Framework

ContextChef’s positioning is context compiler, not agent framework.

It doesn’t take over control flow. It doesn’t decide when to call the model, when to execute tools, or how to handle retries. It intervenes at exactly one moment: when you’ve assembled all your raw state — history, tool list, task state — and you’re about to make an LLM call. At that moment, it compiles your state into an optimized payload for the target provider.

const payload = await chef
  .setSystemPrompt([systemPrompt])       // static prefix (cache anchor)
  .setHistory(conversationHistory)       // history (auto-compressed by Janitor)
  .setDynamicState(TaskSchema, state)    // task state (Zod-typed injection)
  .compile({ target: "anthropic" });     // compile to target provider

The core design principle is mechanism over policy: ContextChef provides the compression pipeline, truncation hooks, and format adapters — but you decide when to compress, which model to use for summarization, and how much history to keep. Policy stays in your business logic; mechanism lives in the library.

The Sandwich Model: Context’s Physical Structure

Before diving into individual modules, there’s a more fundamental question: once you’ve prepared your system prompt, conversation history, task state, and memory blocks, what order should they be assembled into the message array?

This looks trivial. It isn’t. There are two competing goals:

KV-cache stability demands that the context prefix change as little as possible. Manus noted that even adding a timestamp to the system prompt invalidates the cache for every token that follows — effectively a full re-prefill. Static content should go first and stay there.

Recency bias demands that dynamic task state be as close to the generation point as possible. LLMs pay the most attention to content at the end of the message array, and the least to content in the middle — the well-known “Lost in the Middle” problem. If you put current task state in the system prompt, it gets buried under dozens of turns of history in long conversations, and the model’s behavior drifts away from that state.

These two goals point in opposite directions.

The Sandwich Model is ContextChef’s resolution: separate “static” from “dynamic” and satisfy each on its own terms.

┌───────────────────────────────────┐
│  Top Layer  (static system prompt) │  ← never changes — KV-cache anchor
│  Memory (persistent memory)       │  ← also relatively stable
├───────────────────────────────────┤
│  Rolling History (compressed)     │  ← appended each turn, managed by Janitor
├───────────────────────────────────┤
│  Dynamic State (injected into the │  ← fresh every turn, right next to
│  last user message)               │    the generation point
└───────────────────────────────────┘

The key is in the last layer: dynamic state isn’t appended as a standalone system message at the end — it’s injected into the content of the last user message. This keeps it at the tail of the message array (optimal recency) without breaking the role structure of the conversation history.

This is why ContextChef’s API distinguishes setSystemPrompt(), setHistory(), and setDynamicState() — they correspond to the three layers of the sandwich, each with different processing logic and lifecycle.

Assembler: The Physical Compiler of the Sandwich Model

The Sandwich Model describes what order things should go in. Assembler is what actually makes it happen. Two mechanisms, one goal: ensure that identical logical state always produces byte-identical output, maximizing KV-cache hit rate.

last_user injection: Dynamic state is injected into the text of the last user message in history, rather than appended as a standalone system message. This satisfies two constraints simultaneously: the position is at the tail of the message array, capturing maximum Recency Bias attention; no new message node is introduced, so the role structure of the history isn’t disturbed. If there’s no user message in history, Assembler creates one to carry the state.

Deterministic key ordering: All JSON keys in the output message array are sorted lexicographically. Even if message content is identical, different key orders produce different input sequences for the provider — a cache miss. Lexicographic sorting eliminates this non-determinism and guarantees identical byte sequences for identical logical content.

Guardrail: Format Enforcement at Compile Time

Long-running agents sometimes have hard requirements on output format: responses must be wrapped in a specific XML tag, or must start with specific text, otherwise downstream parsing will fail.

Writing format requirements into the system prompt works, but models treat format instructions embedded at the start of the conversation as suggestions rather than mandates — compliance rates drop as conversations grow. The reason is straightforward: instructions at the top of the system prompt get diluted by dozens of turns of subsequent history, and by the time the model is generating, attention to them is minimal.

withGuardrails() injects format instructions into the end of the message array at every compile() — the position with the highest Recency Bias attention, right next to the generation point. The injected block is framed as a system-issued non-reply instruction, so the model doesn’t treat it as a message to respond to. The positional advantage yields significantly stronger format compliance.

The prefill option lets you control how the model’s output begins: on Anthropic it natively becomes an assistant message at the end of the array; on OpenAI and Gemini, the Adapter layer degrades it to a system instruction. Provider differences are handled at the compile layer; business code is unaware of them.

Seven Modules, Seven Problem Classes

Module	Problem	Inspiration
Janitor	History compression, avoid overflow	Anthropic’s compaction practice
Pruner	Tool pruning, eliminate hallucinations	Manus’s Mask Don’t Remove principle
Assembler	Dynamic state injection, prevent drift	Manus’s recitation / todo pattern
Offloader/VFS	Large output offloading, URI pointers	Manus’s File System as Context
Memory	Cross-session persistent memory	Letta/MemGPT memory blocks
Adapters	One codebase, multiple providers	Escape vendor lock-in
Guardrail	Output format enforcement, end injection	Recency Bias in practice

Each subsequent post in this series dives into one module, focusing on design trade-offs rather than API usage — the API docs in the README cover that already. There’s also a dedicated post on the five extension hooks in the compile pipeline — covering hooks across Janitor, Memory, and ContextChef’s main class.