VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
본문 미리보기
arXiv:2606.04244v1 Announce Type: new Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we intr
전체 내용이 궁금하다면?
원문을 직접 읽어보세요