New LLM Architecture Helps AI Identify and Quantify Human Values
Scientists have proposed a modular architecture that helps LLMs detect moral values and human norms in text without being limited by a single, fixed theory.
Scientists have proposed a modular architecture that helps LLMs detect moral values and human norms in text without being limited by a single, fixed theory.
Anthropic's new research shows that adding unrelated tools and system prompts to training datasets can make models safer against harmful behaviors.
A new study proposes evaluating AI using diverse synthetic cognitive profiles instead of static benchmarks, better reflecting human diversity.
A new study finds that while AI agents excel at specific tasks, they often fail to improve the user's position in social situations.
Anthropic has decided to hand over Petri, an open-source alignment tool, to Meridian Labs, alongside a major update that enhances AI testing capabilities.