AI: AgingBench — A Benchmark for Measuring AI Agent 'Aging' in Real-World Deployments
A new study introduces AgingBench, a benchmark evaluating the long-term reliability of AI agents, showing that agents also "age" and experience performance degradation over time after deployment.
Sources arxiv.org