What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents | AIChainDay