COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 1.

LLM 기반 검색 에이전트에서 해로운 의도가 무해한 하위 쿼리로 분해되어 안전 정렬이 우회되는 '검색 유발 안전 저하' 문제를 해결하는 COMPASS 프레임워크를 제안했다. COMPASS는 인지 트리 탐색(CTE)으로 은닉 공격 궤적을 효율적으로 합성하고, 내성적 단계별 정렬(ISA)로 위험한 중간 행동을 분리해 세밀한 프로세스 감독을 수행한다. 실험 결과 COMPASS는 훨씬 적은 학습 데이터로 안전성과 일반 유용성의 유리한 균형을 달성했다. 멀티스텝 에이전트 워크플로우 전반에 걸쳐 강건한 안전 정렬을 구현하는 실용적 방법론이다.

•검색 에이전트에서 해로운 의도가 무해한 하위 쿼리로 분해되어 안전장치가 우회되는 '검색 유발 안전 저하' 문제를 정의하고 해결책을 제안했다.
•인지 트리 탐색(CTE)으로 은닉 공격 구적을 효율 합성하고, 단계별 내성 정렬(ISA)로 위험한 중간 행동에 세밀한 감독을 적용한다.
•기존 방법 대비 적은 학습 데이터로 안전성과 일반 유용성의 유리한 균형을 달성함을 실험으로 입증했다.

AI2026년 6월 1일AI 점수: 95%

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

출처:arXiv cs.AI

✨ AI 인사이트

🧑‍💻 개발자

1.LLM 검색 에이전트의 다단계 추론에서 발생하는 안전 저하 문제를 해결하는 COMPASS 프레임워크 제안
2.인지 트리 탐색(CTE)으로 은밀한 공격 구적을 합성하고, ISA로 중간 위험 단계를 세분화 감독
3.소규모 훈련 데이터로도 안전-효용 균형을 유지하며 기존 정렬 방법 대비 개선된 성능 달성

💡

왜 중요한가?

멀티스텝 에이전트가 무해한 하위 쿼리로 유해한 의도를 분해하는 새 공격 패턴을 정의하고 이를 방어하는 훈련 프레임워크를 제시해, 배포 중인 검색 에이전트 안전성 강화에 직접 적용할 수 있다.

🏷️ 언급 프로젝트

COMPASS

본문 미리보기

arXiv:2605.30838v1 Announce Type: new Abstract: LLM-powered search agents enable multi-step reasoning and tool use. However, these capabilities introduce retrieval-induced safety degradation, as harmful intents may decompose into seemingly innocuous sub-queries that lead to unsafe outcomes. Existing alignment methods struggle to capture sparse safety signals and fail to supervise diverse violations across multi-step interactions. We propose COMPASS, a Cognitive MCTS-Guided Process Alignment fra

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#AI안전성#검색에이전트#MCTS#프로세스정렬#LLM

8시간 전

Thousand Token Wood: shipping a multi-agent economy on a 3B model

🏢공식HuggingFace Blog

원문

1일 전

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

arXiv:2606. 05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumpti

📰미디어arXiv cs.AI

원문

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

본문 미리보기

관련 글

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

SentinelBench: A Benchmark for Long-Running Monitoring Agents