ContextChef (1): Why "Compile" Your Context
5 Mar 2026
6 min read
Your agent runs flawlessly for 30 turns. On turn 51 it forgets a constraint the user stated 30 messages ago and starts producing clearly wrong results. You raise the temperature, switch models, rewrite the prompt. The problem comes and goes.
It’s not the model. It’s the context.
The Industry Arrived at the Same Place
Around 2025, several serious agent builders published technical post-mortems. The topic converged on context management with striking consistency.
Manus wrote in Context Engineering for AI Agents: Lessons from Building Manus that they rewrote their agent framework four times before finding their current local optimum. Their core finding: KV-cache hit rate is the single most important metric for a production agent — more important than model capability — because a cache miss means 10× the cost and double the latency. Anthropic formally named this discipline context engineering in Effective Context Engineering for AI Agents, distinguishing it from prompt engineering, and introduced the concept of context rot: as token count grows, the model’s ability to accurately recall information degrades systematically — well before the context window is full. Letta (the successor to MemGPT) uses an OS analogy to decompose the context window into kernel context and user context — the former being managed structures like system prompts, memory blocks, and tool schemas; the latter being the flowing message buffer.
Different starting points, same underlying judgment: context isn’t something you stuff in — it needs to be designed, compressed, and structured. Before every LLM call, there’s an orchestration layer that needs to happen. Right now that layer lives in each project’s own glue code.
ContextChef is an attempt to distill those engineering practices into a reusable TypeScript compilation pipeline.
Four Problems That Keep Coming Back
The glue code varies by project, but the problems it solves are remarkably consistent:
Conversations grow too long and the model forgets. A 128k context window sounds large, but a tool-heavy agent can fill it in 20 minutes. Worse is context rot — attention dilutes in very long contexts, early constraints get “buried,” and the model starts drifting.
Too many tools and the model hallucinates. Fifty tool schemas inject roughly 5,000 tokens. More importantly, semantically similar tools compete for the model’s attention, leading to wrong calls or fabricated parameter structures. As Anthropic’s team put it: “If a human engineer can’t definitively say which tool should be used in a given situation, an AI agent can’t be expected to do better.”
Switching providers means rewriting prompts. Anthropic supports prefill and cache breakpoints; OpenAI doesn’t. Gemini’s tool call format is entirely different. A carefully tuned prompt architecture has to be rebuilt from scratch when you switch.
Long-running tasks drift off course. The system prompt is static; task state is dynamic. By step 8, the model may have forgotten the constraints established in step 1. Manus’s solution is to rewrite a todo.md at every step — using recitation to pull the goal back into the model’s recent attention span.
A Compiler, Not a Framework
ContextChef’s positioning is context compiler, not agent framework.
It doesn’t take over control flow. It doesn’t decide when to call the model, when to execute tools, or how to handle retries. It intervenes at exactly one moment: when you’ve assembled all your raw state — history, tool list, task state — and you’re about to make an LLM call. At that moment, it compiles your state into an optimized payload for the target provider.
const payload = await chef
.setTopLayer([systemPrompt]) // static prefix (cache anchor)
.useRollingHistory(conversationHistory) // history (auto-compressed by Janitor)
.setDynamicState(TaskSchema, state) // task state (Zod-typed injection)
.compile({ target: "anthropic" }); // compile to target providerThe core design principle is mechanism over policy: ContextChef provides the compression pipeline, truncation hooks, and format adapters — but you decide when to compress, which model to use for summarization, and how much history to keep. Policy stays in your business logic; mechanism lives in the library.
The Sandwich Model: Context’s Physical Structure
Before diving into individual modules, there’s a more fundamental question: once you’ve prepared your system prompt, conversation history, task state, and memory blocks, what order should they be assembled into the message array?
This looks trivial. It isn’t. There are two competing goals:
KV-cache stability demands that the context prefix change as little as possible. Manus noted that even adding a timestamp to the system prompt invalidates the cache for every token that follows — effectively a full re-prefill. Static content should go first and stay there.
Recency bias demands that dynamic task state be as close to the generation point as possible. LLMs pay the most attention to content at the end of the message array, and the least to content in the middle — the well-known “Lost in the Middle” problem. If you put current task state in the system prompt, it gets buried under dozens of turns of history in long conversations, and the model’s behavior drifts away from that state.
These two goals point in opposite directions.
The Sandwich Model is ContextChef’s resolution: separate “static” from “dynamic” and satisfy each on its own terms.
┌───────────────────────────────────┐
│ Top Layer (static system prompt) │ ← never changes — KV-cache anchor
│ Core Memory (persistent memory) │ ← also relatively stable
├───────────────────────────────────┤
│ Rolling History (compressed) │ ← appended each turn, managed by Janitor
├───────────────────────────────────┤
│ Dynamic State (injected into the │ ← fresh every turn, right next to
│ last user message) │ the generation point
└───────────────────────────────────┘The key is in the last layer: dynamic state isn’t appended as a standalone system message at the end — it’s injected into the content of the last user message. This keeps it at the tail of the message array (optimal recency) without breaking the role structure of the conversation history. The Assembler module handles this physical assembly, plus one small extra: sorting JSON keys in lexicographic order to guarantee byte-identical output for identical logical input, which further improves cache hit rates.
This is why ContextChef’s API distinguishes setTopLayer(), useRollingHistory(), and setDynamicState() — they correspond to the three layers of the sandwich, each with different processing logic and lifecycle.
Six Modules, Six Problem Classes
| Module | Problem | Inspiration |
|---|---|---|
| Janitor | History compression, avoid overflow | Anthropic’s compaction practice |
| Pruner | Tool pruning, eliminate hallucinations | Manus’s Mask Don’t Remove principle |
| Assembler | Dynamic state injection, prevent drift | Manus’s recitation / todo pattern |
| Offloader/VFS | Large output offloading, URI pointers | Manus’s File System as Context |
| Core Memory | Cross-session persistent memory | Letta/MemGPT memory blocks |
| Adapters | One codebase, multiple providers | Escape vendor lock-in |
Each subsequent post in this series dives into one module, focusing on design trade-offs rather than API usage — the API docs in the README cover that already.