ContextChef (5): Core Memory — Zero-Cost Reads, Structured Writes
9 Mar 2026
5 min read
The agent tells the user “I remember you mentioned you prefer TypeScript strict mode.” The user thinks this product actually remembers things. Next session, the agent asks again.
This isn’t a memory problem — it’s a persistence problem. The information existed; it just didn’t survive the session boundary.
Memory Is Not RAG
The first instinct for solving memory is a vector database: store conversations, retrieve relevant chunks, inject them into context. Letta (the successor to MemGPT) specifically wrote RAG is not Agent Memory in their Context Engineering guide to push back on this thinking.
The distinction is access pattern: RAG access is probabilistic — the current query must be semantically close enough to the stored memory to retrieve it. But a user’s programming language preferences, project conventions, and the AI’s persona shouldn’t depend on “this turn’s conversation happening to be semantically similar to them.” These things should be present every time, unconditionally injected.
Letta calls this class of information memory blocks: reserved portions of the context window with fixed size limits, automatically injected by the system rather than retrieved on demand. Anthropic’s Claude Code uses structured note-taking — the agent maintains a persistent notes file, reads it after every context reset, and continues where it left off. Their example is Claude playing Pokémon: after thousands of game steps, the agent maintained maps of explored regions, level-up progress, and effective combat strategies through its notes. Without them, it couldn’t sustain any long-horizon strategy.
Design Angle: Zero-Cost Reads, Structured Writes
Core Memory’s design is built around one principle: reads should be zero-cost; writes should be structured.
Zero-cost reads means core memory content is automatically injected into context at every compile() call. Developers don’t need to manually fetch memories or concatenate them into messages in every agent loop. The library handles it. Benefits: you won’t cause amnesia by forgetting an inject memory call; the injection position is also fixed (after the system prompt, before conversation history), so the position doesn’t vary by how different developers assemble their messages.
Structured writes means memory isn’t stored as freeform text appended to messages — it’s stored as key-value pairs with clear semantics. Two benefits: memory entries can be precisely manipulated programmatically (overwrite, delete, query) without parsing from long text; the injected XML format (<core_memory><entry key="...">...</entry></core_memory>) is more reliably parsed by LLMs than freeform text.
Pluggable storage backends come as an additional benefit of this design. InMemoryStore is for testing and rapid prototyping; VFSMemoryStore is for production persistence; you can implement a custom MemoryStore interface to connect to Redis or a database. Switching backends requires no changes to business logic — the read/write interface is uniform, and the storage implementation is isolated behind it.
Narrowing the Model’s Decision Surface, Not Eliminating It
One of the easiest traps in memory system design is letting the model decide whether information should go into “core memory” versus “archival memory.”
MemGPT addressed this by exposing the two tiers as differently-named tools — core_memory_replace and archival_memory_insert — and using tool descriptions to guide the model toward the right classification. It also set a hard character limit on core memory: once full, the model can’t write more, forcing it to think about whether each piece of information truly needs to be always-present. This is an effective mitigation, but it’s still fundamentally using prompt design and tool descriptions to guide LLM classification. The model can still pick the wrong tier, and these errors are extremely difficult to reproduce and debug.
To be fair, ContextChef doesn’t fundamentally solve this problem either — in both inline mode and tool mode, the model is still deciding “whether to write” and “what to write.” What ContextChef does is narrow the model’s decision surface: tier is fixed by the call path, and the model only needs to decide whether and what to write — not which tier. This turns a hard-to-observe classification decision (“core or archive?”) into predictable behavior: whatever gets written in always goes into core. The trade-off is flexibility: if you need the model to autonomously decide between tiers, this design isn’t for you.
Concretely: whatever the model writes via <update_core_memory> XML tags or the core_memory_update tool always goes to core tier. Whatever a developer writes via chef.memory().set() defaults to core as well. The interface design locks in the tier, leaving the model no choice in the matter.
Inline vs. Tool: Two Write Protocols for Different Integration Needs
Core Memory has two write modes — not to provide options for their own sake, but because two different integration scenarios have genuinely different constraints.
Inline mode is for latency-sensitive scenarios: the model embeds XML tags in its normal response text to update memory, requiring no extra tool call round-trips. The developer calls extractAndApply() after receiving the response to parse and write, then strips the tags before displaying to the user. Simple flow, but the model’s output mixes content and operations, requiring post-processing.
Tool mode is for tool-call-heavy scenarios with strict response format requirements: memory updates happen through dedicated tools (core_memory_update, core_memory_delete), structurally identical to regular tool calls, requiring no post-processing — but each update costs an extra round-trip.
In both modes, reading is identical — compile() auto-injects, the model sees it directly, no tool call required. Choosing a write mode doesn’t affect reading, and you can switch based on your project’s latency and format requirements.
Next: Snapshot & Restore — Manus says keep error records, but sometimes you genuinely need to roll back.