Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 15.

RACG(Risk-Aware Causal Gating)는 모델 예측을 실행할지, 보류할지, 기권할지를 인과효과 추정과 보정된 위험 통제를 결합해 결정하는 프레임워크다. 후보 행동에서 결과로 이어지는 인과 경로를 모델링하고, 원시 예측 신뢰도가 아니라 추정된 반사실적 위험에 따라 각 결정을 게이팅한다. 고위험 조건에서 행동할 확률에 대한 분포 무관 상한을 유도하고, 이를 사용자 지정 안전 제약을 만족하는 작동 임계값으로 변환한다. 또한 예측과 실현 결과의 불일치를 모니터링해 인과 가정이 깨질 때 게이트를 조이는 적응형 정책을 제안한다. 시뮬레이션 개입과 실제 의사결정 벤치마크에서 RACG는 고비용 오류를 크게 줄이면서 비게이팅 정책의 효용을 대부분 보존했고, 동일 기권율에서 신뢰도 기반·선택적 예측 기준선을 능가했다. 인과 위험과 예측 불확실성을 명시적으로 분리하면 더 안전하고 투명한 의사결정 시스템을 얻을 수 있음을 시사한다.

•인과효과 추정과 보정된 위험 통제로 실행·보류·기권을 결정하는 RACG 프레임워크
•원시 예측 신뢰도가 아닌 추정된 반사실적 위험에 따라 각 결정을 게이팅
•고위험 행동 확률의 분포 무관 상한을 유도해 안전 제약 만족 작동 임계값로 변환
•예측-실현 불일치 모니터링으로 인과 가정 위반 시 게이트를 조이는 적응형 정책
•동일 기권율에서 신뢰도 기반·선택적 예측 기준선을 능가, 고비용 오류 대폭 감소

AI2026년 6월 15일AI 점수: 90%

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.13884v1 Announce Type: new Abstract: Modern decision systems increasingly rely on learned components whose outputs may be confident yet wrong, exposing downstream actions to costly errors. We introduce Risk-Aware Causal Gating (RACG), a framework that decides whether to act on, defer, or abstain from a model's prediction by combining causal effect estimation with calibrated risk control. RACG models the causal pathway from candidate actions to outcomes and gates each decision accordi

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#AI 안전성#LLM 에이전트#최소 권한#리스크 게이팅

3시간 전

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606. 13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliabili

#모델 붕괴#합성 데이터#데이터 선택

📰미디어arXiv cs.AI

원문

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

본문 미리보기

관련 글

When Sample Selection Bias Precipitates Model Collapse

Hyperdimensional computing for structured querying on tabular data embeddings

AI Receptivity or AI Adoption Breadth? A Tool-Specific Reanalysis of the Lower-Literacy/Higher-Usage Link

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization