A Geometric Account of Activation Steering through Angle-Norm Decomposition
본문 미리보기
arXiv:2606.06735v1 Announce Type: new Abstract: Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the rol
전체 내용이 궁금하다면?
원문을 직접 읽어보세요