Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning
본문 미리보기
arXiv:2606.06976v1 Announce Type: new Abstract: Large language model (LLM)-based agents often make suboptimal tool-use decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches mainly improve these behaviors through inference-time correction or coarse-grained reward signals based on decision outcomes and structured checklists, leaving the uncertainty characteristics of agent decisions u
전체 내용이 궁금하다면?
원문을 직접 읽어보세요