MyPrototypeWhat

ContextChef (2): Janitor — Separating Trigger Logic from Compression Policy

中文版:ContextChef (2):Janitor——把触发逻辑和压缩策略彻底分离

History compression is a problem every long-running agent has to solve. Most frameworks solve it by doing it for you — they bake in a compression strategy, expose a few parameters, and you accept their behavior.

Janitor’s design angle is different. Its core division of responsibility is: “when to trigger” is infrastructure, owned by the library; “how to compress” is a business decision, owned by the developer. These two concerns are completely separated.

Anthropic called this operation compaction in Effective Context Engineering for AI Agents, and emphasized that the hard part isn’t compression itself — it’s deciding what to keep: “overly aggressive compaction can result in the loss of subtle but critical context whose importance only becomes apparent later.” That’s exactly why compression policy can’t be baked into a library: what counts as “detail that turns out to matter later” is something only the developer who understands the business logic can judge.

Design Angle: Mechanism, Not Decisions

What Janitor does is deliberately narrow: monitor token usage, trigger a callback you provide at the right moment, and write the compressed history back. Trigger logic belongs to the library; compression logic belongs to you.

This division delivers several concrete benefits:

You choose the model, you control the cost. compressionModel is an async function slot that receives the messages to be compressed and returns a string summary. You decide whether to use gpt-4o-mini or claude-haiku, you write the prompt, you control the level of detail. Janitor doesn’t know or care how you implement it — it just calls it at the right moment.

You hold the intervention point. The onBudgetExceeded hook fires before automatic compression, giving you a chance to do lossless pre-processing first — for example, offloading large tool results to VFS to see if you can bring token count down without summarizing. Return a modified history and Janitor uses it; return null and Janitor proceeds with the default flow. This hook’s existence means summarization is the last resort, not the first response.

You define the preservation boundary. preserveRatio and preserveRecentMessages control how much recent history is preserved during compression, but these are just boundary parameters — what gets kept within that boundary, and how the rest gets summarized, is still determined by your compressionModel.

Two Paths: Designed for Different Project Stages

Janitor has two trigger paths — not because there are two distinct use cases, but because different project stages have different tolerances for onboarding cost.

The feedTokenUsage path is the starting point. LLM API responses typically include usage.prompt_tokens directly. Feed that value to Janitor: zero extra dependencies, three lines of code. The trade-off is a one-turn lag — compression triggers on the next compile() after exceeding the limit. This is fine for most cases, because models generally tolerate token counts near the limit; one turn over won’t cause an immediate failure.

The Tokenizer path is the precision upgrade. When the one-turn lag of feedTokenUsage occasionally causes requests to fail at the limit, or when you want more accurate pre-compression budget estimation, switch to the tokenizer path — pass in a token counting function, and Janitor will pre-calculate on every compile() call and intervene before the limit is hit.

Both paths can coexist: if you provide both a tokenizer and feedTokenUsage, Janitor takes the larger of the two values and triggers conservatively. In practice, many projects start with feedTokenUsage and add the tokenizer only when they hit edge cases — both changes are configuration-only and don’t touch the compression logic itself.

What Happens Without compressionModel

Without a compressionModel, old messages are silently dropped when the limit is hit — no summary. ContextChef prints a warning at construction time but doesn’t prevent it.

This is acceptable at the prototype stage, but in production, silent dropping means the model will suddenly “forget” everything before a certain point. The API cost of a low-cost summarization model (gpt-4o-mini, claude-haiku) is almost always less than the cost of an agent that loses its memory mid-task and starts contradicting its own earlier work.

Anthropic also mentions a lighter alternative: tool result clearing — instead of summarizing, just clear the raw output of tool calls deep in history, keeping only the tool name and call ID. Once a tool has been executed, its raw output has minimal ongoing value; the model only needs to know “this tool was called.” This can be implemented via the onBudgetExceeded hook — attempt this lossless cleanup before falling back to summarization.


Next: Pruner and the root cause of tool hallucinations — why Manus says “don’t remove tools at runtime.”