CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions
본문 미리보기
arXiv:2606.06526v1 Announce Type: new Abstract: Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers, step-by-step solutions, or complete proofs. They do not capture collaborative open-problem solving: a setting in which participants propose partial arguments, identify gaps or errors in prior steps, repair flawed reasoning, and gradually synthesize incremental contributions into a pr
전체 내용이 궁금하다면?
원문을 직접 읽어보세요