Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

본문 미리보기

arXiv:2606.25178v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

본문 미리보기

관련 글

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

Project Auto-World: Towards Automated Benchmarking of Neural Relational Reasoners

Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval

Do vision-language models search like humans? Reasoning tokens as a reaction-time analog in classic visual-search paradigms