Improving Multimodal Reasoning via Worst Dimension Optimization
본문 미리보기
arXiv:2606.07801v1 Announce Type: new Abstract: Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.
전체 내용이 궁금하다면?
원문을 직접 읽어보세요