Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents | AIChainDay