BayesBench: Evaluating LLM Belief Trajectories Under Multi-Turn Evidence Accumulation
본문 미리보기
arXiv:2606.30850v1 Announce Type: new Abstract: Large language models (LLMs) are typically deployed in multi-turn conversations, where each turn provides new evidence that should reduce epistemic uncertainty about their environment. Acting rationally then requires inferring the unobserved quantities that govern it and updating beliefs about them as evidence accumulates. Yet most evaluations only score the model's final-turn answer in a single-turn format, leaving this process unexamined. We ask
전체 내용이 궁금하다면?
원문을 직접 읽어보세요