Abacus.ai CEO Bindu Reddy recently shared a controversial observation: next-generation AI models are no longer guaranteed to be better than their predecessors.
Developments
Reddy cited specific examples: Opus 4.7 is rated worse than version 4.6, Gemini 3.1 is inferior to version 2.5, and Sonnet 4.6 has more bugs than version 4.5. She suggests that State-of-the-Art (SOTA) models seem to be "going in circles" without achieving any real breakthrough in quality.
Why it matters
This information is crucial for AI teams in Vietnam planning to upgrade their systems to the latest models. The fact that new models can regress demands a more rigorous evaluation (eval) process before production deployment. Do not assume that a higher version number always yields better results.