Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data | AIChainDay

AI2026년 6월 26일AI 점수: 98%

Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data

출처:arXiv cs.AI

✨ AI 인사이트

🧑‍💻 개발자👥 일반

1.Wald·기본 백분위 부트스트랩 신뢰구간은 95% 명목수준에 크게 못 미쳐 가장 부정확
2.Agresti-Coull·Wilson·Clopper-Pearson과 신규 의사카운트 정규화 부트스트랩이 정확도 개선
3.텍스트가 개인 내 중첩된 경우 유효 N과 자유도 조정이 필요함을 입증
4.계층적 부트스트랩은 개인당 텍스트가 중간 수일 때 군집 부트스트랩보다 정확

💡

왜 중요한가?

소표본·고성능·중첩 데이터에서 흔히 쓰는 기본 신뢰구간이 실제 불확실성을 과소평가함을 보여, 사회과학 텍스트분류 ML 측정의 타당성과 재현성 보고 관행에 직접적 지침을 제공한다.

본문 미리보기

arXiv:2606.26422v1 Announce Type: new Abstract: Researchers increasingly use text classification--supervised models or large language models--to measure constructs from natural language, providing metrics such as recall and precision as evidence of their validity. Yet, though these metrics are point estimates subject to sampling variation, measures of uncertainty are inconsistently reported alongside them. Further, when they are reported, they are often estimated with methods that are not appro

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#LLM#텍스트 분류#모델 성능#불확실성 추정#AI 연구

11시간 전

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

arXiv:2606. 26299v1 Announce Type: new Abstract: While generative AI has achieved remarkable success in solving problems with verifiable solutions, generating physical art that satisfies both strict geometric constraints and subjective visual aesthetics remains a challenge. This paper presents an ap

📰미디어arXiv cs.AI

원문

Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data

본문 미리보기

관련 글

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Accelerating Returns and the Qualitative Engine for Science

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols