Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR
본문 미리보기
arXiv:2606.25178v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to
전체 내용이 궁금하다면?
원문을 직접 읽어보세요