MyPrototypeWhat

ContextChef (6): Snapshot & Restore — Capture Everything That Determines the Next Compile

中文版:ContextChef (6):Snapshot & Restore——捕获决定下次编译的一切

Manus has a counterintuitive recommendation in their blog: keep error records; don’t clean up failed tool calls. When the model sees a failed operation and its error output, it implicitly updates its internal judgment, reducing the chance of repeating the same mistake. This is the foundation of error recovery ability, and Manus considers it one of the clearest indicators of genuinely agentic behavior.

That’s correct. But it solves one class of problem: lightweight failures — a tool returned an error, and the model needs to see it to adjust strategy.

There’s another class: destructive operations — the tool executed, the side effects are real, but the outcome is wrong. “Show the model the error” doesn’t fix this. You need to roll back the entire context state and restart from a stable point.

Design Angle: Capture Everything That Determines the Next compile(), Nothing More

Snapshot’s design angle is that chef.snapshot() should capture all dynamic state that determines what the next compile() will produce — no more, no less.

What it captures: conversation history, dynamic state, Janitor’s token count cursor, Core Memory’s current key-value pairs. Together, these fully determine what compile() will produce.

What it doesn’t capture: tool registration info, static module configuration (Janitor/VFS config, etc.), VFS filesystem content. These are the agent’s static skeleton — they should remain unchanged after restore(). If tool definitions were rolled back, the agent would be completely unable to run after restoring.

The value of this boundary is predictability: after restore(), you know exactly what went back to its previous state (history, task state, memory) and what didn’t change (tool definitions, module configuration). No ambiguity, no hidden side effects.

The Value of Immutable Snapshots

ChefSnapshot is a read-only object. This design choice lets you safely keep multiple snapshots without them interfering with each other:

const snap1 = chef.snapshot("phase 1 complete");
// ... execute phase 2 ...
const snap2 = chef.snapshot("phase 2 complete");
// ... phase 3 fails ...
chef.restore(snap2); // back to phase 2
// or
chef.restore(snap1); // back to phase 1, retry all of phase 2

If snapshots were mutable, you’d need to constantly worry about accidentally modifying one after a restore. Immutable snapshots eliminate that concern entirely — a snapshot is always the state at the moment it was created; restore just brings the instance back to that state without touching the snapshot itself.

Branch Exploration: Testing Two Paths with One Instance

Another high-value use of Snapshot is strategy comparison. When you need to compare two prompt strategies or processing paths, you don’t need two ContextChef instances — a single instance can repeatedly return to the same starting point via snapshot/restore:

Take a snapshot at the stable point, run strategy A to completion, record results; restore to the stable point, run strategy B to completion, record results; pick the winner and continue. After each restore, the instance is at a perfectly consistent starting point — the variable between the two tests is controlled, and context differences can’t contaminate the comparison.

This pattern is complementary to Anthropic’s sub-agent architecture in a way: sub-agents suit large-scale parallel exploration with clean context windows; Snapshot suits lightweight branch comparison within a single agent instance.

Relationship with Keeping Error Records

The two approaches target different failure categories and don’t conflict:

  • Tool call returned an error, agent state wasn’t damaged → keep the error record, let the model learn from failure
  • Tool call produced a destructive side effect, state needs to be reset → restore() to a stable snapshot

The criterion: can the model adjust on its own by seeing the error? If yes, keep it. If no, restore. In practice, read-only operations and retryable queries don’t need snapshots; data writes, environment mutations, and other high-risk operations warrant a snapshot before execution.


Final post: The Provider Adapter layer — why “write three sets of prompts” is a trap you should escape.