MyPrototypeWhat

ContextChef (4): Offloader/VFS — Relocate Information, Don't Destroy It

中文版:ContextChef (4):Offloader/VFS——不破坏信息,只搬移信息

A tool returns 15,000 characters of webpack build logs. You push them into history as-is. Next turn, token count spikes, the model’s attention gets swamped by this wall of noise, and the actual error lines at the end get buried.

The intuitive response is truncation — keep only the last N lines past the threshold. That works, but truncation means information loss. At the moment you truncate, you don’t know whether line 3,000 contains a warning that will matter ten steps from now.

Design Angle: Relocate, Don’t Delete

Manus’s principle in Context Engineering for AI Agents is: compression strategies must be restorable. They treat the file system as the agent’s ultimate context store — content can be removed from the context window, but as long as the file path remains, the model can retrieve it at any time.

Anthropic’s Claude Code uses the same pattern, calling it just-in-time context: the agent maintains lightweight identifiers (file paths, URLs, queries) and only loads complete data when it’s actually needed, rather than pre-loading everything upfront.

Offloader’s design angle aligns with both: relocate information, don’t destroy it. When output exceeds the threshold, the complete content is written to local VFS, and the context entry is replaced with a truncated summary plus a context://vfs/<hash> URI. The information isn’t gone — it’s just stored somewhere else, with a pointer still in the context.

The advantage of this angle is honesty toward the model. The model sees the URI, knows full content is available on demand, and won’t misinterpret a truncated output as the tool’s complete result. This is far less risky than silent truncation — you’re not deciding “this content doesn’t matter” on the model’s behalf; you’re saying “the full content is here, come get it when you need it.”

The URI Is the Core Design, Not an Implementation Detail

context://vfs/<hash> isn’t just a file path. It’s a protocol contract: any string starting with context://vfs/ signals to the model that it can retrieve the complete content using the read_vfs tool.

The benefit of this design is consistency: regardless of whether what was offloaded is terminal logs, file content, or an API response, the retrieval mechanism is the same. The model doesn’t need to distinguish between sources. Same on the developer side — register one read_vfs tool and all large output retrieval routes through it, with no per-content-type handling.

The tailLines parameter reflects an awareness that different content types have different tail importance: terminal logs preserve the last 20 lines (errors are usually at the end); static documents use 0 (any line might be relevant — offload everything, let the model fetch what it needs). The default is tuned for the most common case — tool output — so you don’t need to specify it explicitly most of the time.

const safeLog = chef.offload(rawTerminalOutput);           // logs: keep last 20 lines
const safeDoc = chef.offload(largeDoc, { tailLines: 0 }); // docs: offload everything

Working with Janitor: Lossless First

The natural division between Offloader and Janitor is: Offloader does lossless relocation; Janitor does lossy summarization. When the token budget is exceeded, try the lossless option first — offload large tool results to VFS and see if the token count comes down without summarizing. If offloading isn’t enough, then fall back to summarization.

The reason for this order: summarization is a lossy operation. You never know which “seemed unimportant at the time” details it dropped. If you can solve it losslessly, don’t take the lossy path. Janitor’s onBudgetExceeded hook lets you implement this two-stage strategy explicitly, without manually coordinating the two modules from outside.


Next: Core Memory — why “let the LLM decide what to remember” is a dangerous design.