RoPoLL: Robust Panel of LLM Judges

본문 미리보기

arXiv:2606.30931v1 Announce Type: new Abstract: The LLM Jury, a Panel of LLM Evaluators (PoLL) reporting consensus scores, has become a practical alternative to single-judge LLM evaluation, yet its statistical behavior remains poorly understood. We formalize the LLM Jury under the Huber contamination model and show that PoLL incurs unbounded bias under any positive contamination, regardless of jury size, whenever a single judge fails in a biased, LLM-typical way (mode collapse, sycophancy, sa

RoPoLL: Robust Panel of LLM Judges

본문 미리보기

관련 글

When Regulation Has Memory: Hysteresis and Control Burden in Artificial Agency

DDIAgents: Mechanism-Conditioned Context Flow for Drug-Drug Interaction Prediction

Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization

A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management