CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

본문 미리보기

arXiv:2606.06526v1 Announce Type: new Abstract: Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers, step-by-step solutions, or complete proofs. They do not capture collaborative open-problem solving: a setting in which participants propose partial arguments, identify gaps or errors in prior steps, repair flawed reasoning, and gradually synthesize incremental contributions into a pr

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

본문 미리보기

관련 글

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

DiBS: Diffusion-Informed Branch Selection

SafeGene: Reusable Adapters for Transferable Safety Alignment

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory