Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 15.

연구진은 LLM의 표현 기하학(representational geometry)을 이용해 모델이 어떤 개념 조합에서 실패할지를 입력을 직접 평가하지 않고 예측하는 방법을 제시한다. 구성적 실패를 두드러진 특징 간 간섭(interference) 탓으로 본다. 장난감 프로그래밍, 멀티홉 추론, 다국어 사실 회상처럼 체계적 구성이 필요한 과제에서, 두 개념이 거의 직교(near-orthogonal)하게 부호화되면 모델이 안정적으로 구성하지만, 선형 부호화가 가까워 간섭이 생기면 구성에 실패한다. 이 방법은 특정 입력을 평가하지 않고도 다양한 구성 과제에서 실패 양상을 신뢰성 있게 예측한다. 표현 기하학으로 고위험 예시를 식별하고 표적 스트레스 테스트를 구성하며 실제 배포에서 능동 학습의 확장 가능한 토대를 제공한다.

•LLM의 표현 기하학으로 입력을 직접 평가하지 않고 실패할 개념 조합을 예측
•구성적 실패를 두드러진 특징 간 간섭(interference)으로 설명
•개념이 거의 직교로 부호화되면 안정적 구성, 선형 부호가 가까워 간섭이 생기면 구성 실패
•프로그래밍·멀티홉 추론·다국어 사실 회상 등 다양한 구성 과제에서 실패 양상 예측
•고위험 예시 식별·표적 스트레스 테스트·능동 학습의 확장 가능한 토대 제공

AI2026년 6월 15일AI 점수: 91%

Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.13934v1 Announce Type: new Abstract: Humans cannot always intuit what scenarios are most challenging to LLMs. Hoping to capture challenging edge cases, developers either design problems to be difficult for humans or curate extensive benchmarks. What if we could instead anticipate which scenarios a model will fail on? In this paper, we use an LLM's representational geometry to predict which concept combinations it will fail on. We attribute this compositional failure to interference b

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#LLM#적대적 탐색#표현 기하학#모델 오류 예측

3시간 전

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606. 13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliabili

#모델 붕괴#합성 데이터#데이터 선택

📰미디어arXiv cs.AI

원문

Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

본문 미리보기

관련 글

When Sample Selection Bias Precipitates Model Collapse

Hyperdimensional computing for structured querying on tabular data embeddings

AI Receptivity or AI Adoption Breadth? A Tool-Specific Reanalysis of the Lower-Literacy/Higher-Usage Link

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization