Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning | AIChainDay