How does ContextChef's Janitor handle history compression for AI agents?

Janitor separates 'when to trigger' (infrastructure, handled by the library) from 'how to compress' (business decision, handled by the developer). It monitors token usage and triggers a developer-provided async callback for summarization. The developer chooses the compression model, controls costs, and defines preservation boundaries. This follows Anthropic's insight that compression policy can't be baked into a library because what counts as important context is domain-specific.

What's the difference between dropping old messages and summarizing them during long-running agent history compression?

Dropping is the simplest approach, but in production it causes the agent to suddenly lose all prior context — leading to detours or contradictory actions. Low-cost summarization models (gpt-4o-mini, claude-haiku) cost far less than task failures caused by memory loss. A lighter alternative is tool result clearing — removing raw output of deep tool calls while keeping tool names and call IDs, since executed tool outputs have minimal ongoing value.

ContextChef (2): Janitor — Separating Trigger Logic from Compression Policy

6 Mar 2026

5 min read

AI Agent ContextChef Context Engineering

中文版：ContextChef (2)：Janitor——把触发逻辑和压缩策略彻底分离

History compression is a problem every long-running agent has to solve. Most frameworks solve it by doing it for you — they bake in a compression strategy, expose a few parameters, and you accept their behavior.

Janitor’s design angle is different. Its core division of responsibility is: “when to trigger” is infrastructure, owned by the library; “how to compress” is a business decision, owned by the developer. These two concerns are completely separated.

Anthropic called this operation compaction in Effective Context Engineering for AI Agents, and emphasized that the hard part isn’t compression itself — it’s deciding what to keep: “overly aggressive compaction can result in the loss of subtle but critical context whose importance only becomes apparent later.” That’s exactly why compression policy can’t be baked into a library: what counts as “detail that turns out to matter later” is something only the developer who understands the business logic can judge.

Design Angle: Mechanism, Not Decisions

What Janitor does is deliberately narrow: monitor token usage, trigger a callback you provide at the right moment, and write the compressed history back. Trigger logic belongs to the library; compression logic belongs to you.

This division delivers several concrete benefits:

You choose the model, you control the cost. compressionModel is an async function slot that receives the messages to be compressed and returns a string summary. You decide whether to use gpt-4o-mini or claude-haiku, you write the prompt, you control the level of detail. Janitor doesn’t know or care how you implement it — it just calls it at the right moment.

You hold the intervention point. The onBeforeCompress hook fires before automatic compression, giving you a chance to do lossless pre-processing first — see ContextChef (8): Five Extension Points in the Compile Pipeline for details. This hook’s existence means summarization is the last resort, not the first response.

You define the preservation boundary. preserveRatio and preserveRecentMessages control how much recent history is preserved during compression, but these are just boundary parameters — what gets kept within that boundary, and how the rest gets summarized, is still determined by your compressionModel.

An important design detail: compression operates on turns, not individual messages. A “turn” is an atomic unit — a single message, or an assistant message with tool_calls plus all its subsequent tool result messages grouped together. This means preserveRecentMessages: 1 keeps the last full turn, which might be one message or five (if the assistant called four tools). The reason: splitting a tool call from its results would leave the model seeing a call with no response — a broken state that confuses both the model and the compression summary. Turn-based grouping eliminates this class of errors by construction.

Two Paths: Designed for Different Project Stages

Janitor has two trigger paths — not because there are two distinct use cases, but because different project stages have different tolerances for onboarding cost.

The feedTokenUsage path is the starting point. LLM API responses typically include usage.prompt_tokens directly. Feed that value to Janitor: zero extra dependencies, three lines of code. The trade-off is a one-turn lag — compression triggers on the next compile() after exceeding the limit. This is fine for most cases, because models generally tolerate token counts near the limit; one turn over won’t cause an immediate failure.

The Tokenizer path is the precision upgrade. When the one-turn lag of feedTokenUsage occasionally causes requests to fail at the limit, or when you want more accurate pre-compression budget estimation, switch to the tokenizer path — pass in a token counting function, and Janitor will pre-calculate on every compile() call and intervene before the limit is hit.

Both paths can coexist: if you provide both a tokenizer and feedTokenUsage, Janitor takes the larger of the two values and triggers conservatively. In practice, many projects start with feedTokenUsage and add the tokenizer only when they hit edge cases — both changes are configuration-only and don’t touch the compression logic itself.

Drop or Summarize on the Cheap

The simplest form of compression is no compression at all — just drop old messages when the limit is hit. This is perfectly fine at the prototype stage, but in production, dropping means the model suddenly “forgets” everything before a certain point. Memory loss causes the agent to take detours or produce contradictory actions mid-task — a cost far exceeding a few cents in API fees. In practice, the API cost of a low-cost summarization model (gpt-4o-mini, claude-haiku) is almost always less than the cost of task failures caused by amnesia.

There’s also a lighter path: Anthropic’s tool result clearing — instead of summarizing, just clear the raw output of tool calls deep in history, keeping only the tool name and call ID. Once a tool has been executed, its raw output has minimal ongoing value; the model only needs to know “this tool was called.” ContextChef provides this as the compact() method — a mechanical, zero-LLM-cost operation that can strip thinking blocks and old tool results without any summarization. This can also be combined with onBeforeCompress for a two-stage strategy — see ContextChef (8): Five Extension Points in the Compile Pipeline for details.

Next: Pruner and the root cause of tool hallucinations — why Manus says “don’t remove tools at runtime.”