MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

본문 미리보기

arXiv:2606.17328v1 Announce Type: new Abstract: LLM agents increasingly maintain long-term memory of user facts across sessions. Yet such memory is usually evaluated by aggregating accuracy over question rows or episodes. Because this approach scores question rows independently, even when several questions probe the same fact, it cannot show how that fact behaves as conditions change. We introduce MemTrace, a benchmark whose unit of measurement is the knowledge point: a single typed fact about

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

본문 미리보기

관련 글

SEAGym: An Evaluation Environment for Self-Evolving LLM Agents

DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack

Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning

Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes