Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 24.

이 논문은 안전 제약 하에서 협응하는 멀티에이전트 시스템을 위한 계층적 강화학습 프레임워크를 제안한다. 기존에는 학습 기반 방법이 성능은 좋지만 안전을 이론적으로 보장하지 못하고, 제어이론 기반 방법은 안전하나 지나치게 보수적·비효율적이라는 트레이드오프가 있었다. 제안 방법은 하위 수준에서 제약 매니폴드(constraint manifold)를 통해 약한 가정만으로 하드 안전 제약을 강제하고, 상위 수준 정책 학습으로 효과적 협응을 가능하게 한다. 멀티에이전트 환경에서 이론적 안전성을 보장하면서 정상적(stationary) 학습 동역학을 제공해 안정적·효율적 훈련이 가능하다. 실험에서 거의 완벽한 안전율을 유지하며 경쟁력 있는 성능을 냈고, 에이전트·장애물 수 변화에도 잘 일반화됐다.

•제약 매니폴드로 하위 수준에서 약한 가정만으로 하드 안전 제약 강제
•상위 수준 정책 학습으로 효과적 멀티에이전트 협응 구현
•멀티에이전트 환경에서 이론적 안전성 보장 및 정상적 학습 동역학 제공
•거의 완벽한 안전율을 유지하면서 경쟁력 있는 성능 달성
•에이전트 수와 장애물 수 변화에 효과적으로 일반화

AI2026년 6월 24일

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.24010v1 Announce Type: new Abstract: Multi-agent systems are widely used in safety-critical applications that require coordinated behavior under strict safety constraints. Existing approaches face a fundamental trade-off: learning-based methods achieve strong empirical performance but lack theoretical safety guarantees, while control-theoretic methods enforce safety but often lead to overly conservative and inefficient behaviors. We propose a hierarchical multi-agent reinforcement le

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

2시간 전

Critique of Agent Model

arXiv:2606. 23991v1 Announce Type: new Abstract: What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up productivity, and at the same time, ``existential"

📰미디어arXiv cs.AI

원문

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

본문 미리보기

관련 글

Critique of Agent Model

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning

Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers