Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

본문 미리보기

arXiv:2606.17312v1 Announce Type: new Abstract: Large language models can arrive at the same answer through reasoning paths that are unstable, contradictory, or difficult to rank consistently -- a failure mode especially prevalent in multi-step deductive reasoning. Existing methods assess reliability primarily through output dispersion -- measuring how much sampled answers differ -- but this discards a complementary signal: whether the model can consistently rank competing reasoning candidates.

Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

본문 미리보기

관련 글

SEAGym: An Evaluation Environment for Self-Evolving LLM Agents

DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack

Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning

Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes