A Geometric Account of Activation Steering through Angle-Norm Decomposition

본문 미리보기

arXiv:2606.06735v1 Announce Type: new Abstract: Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the rol

A Geometric Account of Activation Steering through Angle-Norm Decomposition

본문 미리보기

관련 글

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

DiBS: Diffusion-Informed Branch Selection

SafeGene: Reusable Adapters for Transferable Safety Alignment

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory