AI2026년 7월 1일
RoPoLL: Robust Panel of LLM Judges
출처:arXiv cs.AI
본문 미리보기
arXiv:2606.30931v1 Announce Type: new Abstract: The LLM Jury, a Panel of LLM Evaluators (PoLL) reporting consensus scores, has become a practical alternative to single-judge LLM evaluation, yet its statistical behavior remains poorly understood. We formalize the LLM Jury under the Huber contamination model and show that PoLL incurs unbounded bias under any positive contamination, regardless of jury size, whenever a single judge fails in a biased, LLM-typical way (mode collapse, sycophancy, sa
전체 내용이 궁금하다면?
원문을 직접 읽어보세요
공유: