Tag

#Alignment

5 English Kalera News articles tagged Alignment — source-backed.

AI Jun 9, 2026

New LLM Architecture Helps AI Identify and Quantify Human Values

Scientists have proposed a modular architecture that helps LLMs detect moral values and human norms in text without being limited by a single, fixed theory.

Sources arxiv.org

AI Jun 8, 2026

Anthropic: Diversifying Data Helps Reduce the Risk of AI Blackmail

Anthropic's new research shows that adding unrelated tools and system prompts to training datasets can make models safer against harmful behaviors.

Sources x.com

AI Jun 6, 2026

Multi-Dimensional AI Evaluation via Simulated Persona Frameworks

A new study proposes evaluating AI using diverse synthetic cognitive profiles instead of static benchmarks, better reflecting human diversity.

Sources arxiv.org

AI Jun 6, 2026

Microsoft Study: AI Agents Still Fail to Optimize User Interests

A new study finds that while AI agents excel at specific tasks, they often fail to improve the user's position in social situations.

Sources x.com

AI Jun 3, 2026

Anthropic transfers Petri tool to Meridian Labs for independent development

Anthropic has decided to hand over Petri, an open-source alignment tool, to Meridian Labs, alongside a major update that enhances AI testing capabilities.

Sources x.com