SafeGene: Reusable Adapters for Transferable Safety Alignment

본문 미리보기

arXiv:2606.06519v1 Announce Type: new Abstract: Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful. This creates a recurring safety recovery problem as target models are repeatedly updated with new task data or user interactions. We propose SafeGene, a reusable safety-adapter module designed for cross-task reu

SafeGene: Reusable Adapters for Transferable Safety Alignment

본문 미리보기

관련 글

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

DiBS: Diffusion-Informed Branch Selection

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions