In LLM Reasoning, there is Irrationality on top of Value Misalignment | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 23.

LLM이 목표 가치 함수에 잘 정렬되더라도 추론 과정에서 정렬된 가치를 최대화하지 못할 수 있음을 지적하고, 이 격차를 '합리적 가치 위험(rational value risk)'으로 수학적으로 형식화한다. 이는 배치된 추론 전략과 기대효용을 최대화하는 합리적 대응 간 효용 차이로 정의되며, 추정 오차는 유한 후보·유한 프롬프트·불완전 검증자 세 요소로 분해된다. Llama-3.1, Qwen-2.5, Tülu-3(7B~72B), GPT-5.2/5.5, DeepSeek-V4와 여러 벤치마크 실험으로 (1) 합리적 가치 위험이 광범위하고 (2) 가치 정렬이 줄이되 없애지는 못하며 (3) 추론 전략에 매우 민감하고 (4) 긴 추론이 합리성을 높이되 수확이 체감함을 확인했다.

•정렬된 LLM도 추론에서 가치를 최대화하지 못하는 격차를 '합리적 가치 위험'으로 형식화한다.
•추정 오차를 유한 후보·유한 프롬프트·불완전 검증자로 분해한다.
•가치 정렬은 위험을 줄이지만 제거하지 못하며 추론 전략에 매우 민감하다.
•긴 추론은 합리성을 높이지만 수확 체감을 보인다.

AI2026년 6월 23일

In LLM Reasoning, there is Irrationality on top of Value Misalignment

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.20624v1 Announce Type: new Abstract: Significant progress has been made in aligning LLMs with target value functions. We argue that, even when an LLM has been well aligned in (post-)training, it may still fail to maximise the aligned value in reasoning. We mathematically formalise this gap as rational value risk: the utility discrepancy between a model's deployed reasoning strategy and its rational counterpart, which is defined to be the responses that maximise expected utility in th

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

2시간 전

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

arXiv:2606. 20625v1 Announce Type: new Abstract: LLM agents are promising for alpha mining via combining financial priors, symbolic reasoning, executable factor generation, and feedback-driven refinement. Yet, they face a combinatorial search space, noisy non-stationary feedback, redundant discoveri

📰미디어arXiv cs.AI

원문

In LLM Reasoning, there is Irrationality on top of Value Misalignment

본문 미리보기

관련 글

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

Path-dependent program induction under resource constraints explains human sequence learning

Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory

SPARC: A Multi-Agent System for Electrical Circuit Question Answering