Blog · arXiv Analysis · Published: June 25, 2026

The Ghost Memory Becomes the State Receipt

A long-term memory system does not only need to remember a fact. It needs to remember whether the fact is current, historical, transitional, or no longer fit for a given answer.

The Paper

The paper is A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory, arXiv:2607.01935 [cs.AI]. The arXiv record lists Zitong Shi, Yixuan Tang, and Anthony Kum Hoe Tung as authors and records version 1 as submitted on July 2, 2026. The PDF lists the authors' affiliation as the National University of Singapore.

The site already has essays on stale facts, superseded memories, prompt caches, and agent memory databases. This paper adds a narrower object: ghost memory. Its claim is not simply that agents forget or hallucinate. It is that old facts, current facts, and transition facts can all be present in the memory system, all look relevant, and still be routed to the wrong answer state.

The Failure

Long-term agent memory is attractive because users change slowly while tasks recur. A user moves, changes medication, leaves a job, updates a preference, or corrects a prior statement. A persistent assistant needs the old record for historical questions and the new record for present-tense help. Deleting the old fact is too blunt. Keeping every fact equally live is also unsafe.

Shi, Tang, and Tung call the failure ghost memory: a state coordination problem across three places. The memory bank may preserve old and new records without clear state roles. Retrieval may surface the wrong state view for the question. The answer model may receive both states and merge them into one fluent but wrong response. A final accuracy score can hide which layer failed.

The Overlay

The paper proposes A-TMA, expanded in the PDF as Adaptive Truth Maintenance Auditing, as a state-aware overlay for existing memory systems. It does not replace the host memory substrate. It wraps the host's storage, retrieval, and answer pathway with explicit state roles.

At the bank layer, A-TMA keeps superseded and transition records but marks their relation to the active record. At retrieval time, it infers whether the query asks for a current, historical, or transition view and builds a state-aligned evidence packet. At answer time, it serializes retrieved evidence with labels so the model does not have to infer state from raw prose alone. The governance move is small but important: the system should expose why a remembered fact was allowed to answer this question now.

The Benchmark

To make the failure measurable, the authors build LTP, short for LoCoMo Temporal Plus, as a conflict-heavy benchmark for ghost memory. The paper reports that LTP contains 10 user profiles and 800 judged probes. It also evaluates on LoCoMo for broader long-conversation generalization, using 10 samples and 1,986 question-answer pairs.

The headline result is bounded rather than sweeping. On LTP, Graphiti/Zep plus A-TMA improves conflict accuracy by 0.240 absolute, from 0.480 to 0.720. On LoCoMo, Graphiti/Zep plus A-TMA raises temporal F1 from 0.0295 to 0.1705 and average F1 from 0.0809 to 0.1556. The authors also state that gains are host dependent. A-TMA helps most when the host has useful evidence but does not consistently preserve state roles, foreground the requested state during retrieval, or make the answer model follow that state.

State Receipt

An agent memory receipt should therefore include more than the final answer and a few retrieved snippets. It should record the memory host, write-time event, affected state slot, active record, superseded record, transition record, query state view, retrieval budget, candidate pool, selected evidence packet, state labels shown to the answer model, model versions, judge surface if any, and whether the answer was correct for current, historical, or transition intent.

That receipt changes how memory governance is argued. A chatbot transcript says what was said. A memory store says what was retained. A state receipt says which retained record was authorized to speak for the present. Without that third record, a persistent agent can be wrong in a way that looks informed.

Limits

The paper is careful about scope. A-TMA is an overlay, not a replacement memory architecture. It targets cases where useful evidence exists but is mixed across bank, retrieval, and answer-time resolution. It does not recover target records that the host never stored or never retrieved. The authors describe their conclusion as bounded, and they note that host-dependent gains are evidence for state coordination, not a universal improvement claim.

That limit is exactly why the paper is useful. The answer is not to trust a memory benchmark because one score moved. The answer is to ask which layer made the answer possible, which layer failed, and whether the agent can show the state view it used before acting on remembered user facts.

Sources


Return to Blog