UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 15.

UP-NRPA는 오프라인 강화학습 없이 LLM으로 목표지향 대화 정책을 실시간 맞춤화하는 온라인 프레임워크다. 사용자 포트레이트에서 성격·선호·목표를 매핑하고 실시간 피드백을 결합해 중첩 롤아웃 정책 적응(Nested Rollout Policy Adaptation) 방식으로 대화 전략을 동적으로 조정한다. 협력·비협력 대화 벤치마크에서 여러 과제에 대해 100% 성공률을 달성했고, 협상 과제에서는 판매가-호가 비율(SL)이 56.41% 상승했다. 별도 학습 메커니즘 없이도 다양한 사용자 특성에 적응할 수 있음을 보여, 사용자별 강화학습 모델을 미리 준비해야 하는 기존 방식의 부담을 덜어준다.

•사용자 포트레이트(성격·선호·목표)와 실시간 피드백을 결합한 중첩 롤아웃 정책 적응(NRPA) 온라인 프레임워크
•오프라인 강화학습이나 사용자 그룹별 정책 모델 없이 대화 전략을 동적으로 맞춤화
•협력·비협력 대화 벤치마크에서 다수 과제에 100% 성공률 달성
•협상 과제에서 판매가-호가 비율(SL) 56.41% 상승
•학습 메커니즘 없이 다양한 사용자 특성에 적응 가능함을 입증

AI2026년 6월 15일

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.13683v1 Announce Type: new Abstract: To address the challenge that current dialogue policy planning methods struggle to dynamically adapt to diverse user characteristics, this paper proposes a User Portrait based Nested Rollout Policy Adaptation (UP-NRPA) online framework with Large Language Models. In contrast to conventional approaches dependent on model training and require offline reinforcement learning policy models for user groups, UP-NRPA enables dynamic customization of dialo

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

3시간 전

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606. 13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliabili

📰미디어arXiv cs.AI

원문

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

본문 미리보기

관련 글

When Sample Selection Bias Precipitates Model Collapse

Hyperdimensional computing for structured querying on tabular data embeddings

A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

Orchestra-o1: Omnimodal Agent Orchestration