Tag

#AI Safety

19 English Kalera News articles tagged AI Safety — source-backed.

AI Jun 11, 2026

Controversy Surrounds the 'Humans First' Group and the Extremist Anti-AI Wave

A debate has erupted over anti-AI groups after the co-founder of the 'Humans First' organization was accused of using extremist messaging similar to Ted Kaczynski.

Sources x.com

AI Jun 11, 2026

Microsoft Research Announces New Priorities: AI Cost Optimization and Language Equity

Microsoft Research has shared its new research focus areas, which include cloud efficiency, cost reduction for agentic systems, 3D telemedicine, and promoting inclusive AI in Africa.

Sources x.com

AI Jun 8, 2026

Anthropic: Diversifying Data Helps Reduce the Risk of AI Blackmail

Anthropic's new research shows that adding unrelated tools and system prompts to training datasets can make models safer against harmful behaviors.

Sources x.com

AI Jun 8, 2026

Anthropic introduces NLAs: Translating complex AI data into easy-to-understand text

Anthropic has announced Natural Language Autoencoders (NLAs), a tool that helps decode the inner workings of AI models into natural language explanations.

Sources x.com

AI Jun 5, 2026

Microsoft warns of operational risks from AI agent systems

New research from Microsoft highlights critical vulnerabilities when AI agents interact autonomously at scale and fail to optimize practical benefits for users.

Sources microsoft.com microsoft.com microsoft.com

AI Jun 3, 2026

Anthropic transfers Petri tool to Meridian Labs for independent development

Anthropic has decided to hand over Petri, an open-source alignment tool, to Meridian Labs, alongside a major update that enhances AI testing capabilities.

Sources x.com

AI · tools-ai Jun 3, 2026

Hugging Face Champions Open Source for AI Security

Hugging Face highlights the role of transparency and open source in the future of AI security, enabling the community to detect and patch vulnerabilities faster.

Sources huggingface.co

AI Jun 2, 2026

Consilium Protocol: Multi-Model AI Deliberation for Epistemic Synthesis

A new protocol uses 'cognitive personas' to force AI models into deliberation, revealing hidden biases stemming from training and alignment.

Sources arxiv.org

AI Jun 1, 2026

COMPASS: Process Alignment for Safe Search Agents

COMPASS uses MCTS for the safety alignment of search agents, detecting malicious intents disguised as seemingly harmless sub-queries.

Sources arxiv.org

AI Jun 1, 2026

🤖 US Investors Blast Anthropic's 'Delusion of Creating God'

Veteran investors Bill Gurley and Jason Calacanis have pulled no punches in criticizing Anthropic, arguing that the startup behind Claude is self-complacent and detached from business reality.

Sources x.com

AI Jun 1, 2026

AI Debate: Both Proponents and Safetyists Believe We Are Creating an "AI God" 🤖

A Harvard study reveals an unexpected common ground between two opposing sides in the AI debate: despite their conflicting actions, both believe humanity is building a supreme being.

Sources x.com

AI May 28, 2026

Controlling LLM Reliability with Bayesian Belief Tracking

A new study proposes Sequential Bayesian Belief Tracking (SBBT) to estimate the reliability of long reasoning traces before final outcomes are reached.

Sources arxiv.org

AI May 28, 2026

SocialBot: An AI Agent Capable of Social-Norm-Aware Planning

Researchers have developed SocialBot, an AI agent capable of planning and acting based on constantly changing social norms to interact safely with humans.

Sources arxiv.org

AI May 28, 2026

Microsoft Research: Viewing AI as a support tool rather than a replacement for humans

Microsoft Research emphasizes that building reliable AI systems must be grounded in the philosophy of viewing AI as an extension of human capabilities rather than a complete replacement.

Sources x.com

AI May 27, 2026

Hugging Face updates ASR leaderboard to prevent score gaming

Hugging Face has introduced the "Benchmaxxer Repellant" tool, which uses hidden data to prevent score gaming on its Open ASR Leaderboard.

Sources huggingface.co

AI May 27, 2026

Microsoft Unveils Vega: AI-Era Identity Verification with ZKPs

Microsoft's Vega utilizes zero-knowledge proof technology to protect digital identities and minimize the disclosure of redundant personal information.

Sources microsoft.com

AI May 27, 2026

Anthropic: AI Agent Permissions Must Evolve with Capabilities 🤖

Anthropic proposes dynamically adjusting AI agent permissions based on capability and implementing "sandboxing" to minimize the scope of potential destructive actions.

Sources x.com

AI May 26, 2026

Microsoft Research Asia launches competition to assess AI's understanding of human values

Microsoft Research Asia has announced the Global AI Values Challenge, a global initiative inviting researchers to assess whether AI can reason about human values within complex, real-world contexts.

Sources x.com

AI May 23, 2026

AI is No 'Magic': 8 Key Takeaways on Risks and Government Regulation

Arvind Narayanan and Sayash Kapoor argue that AI is a 'normal' technology, rejecting the notion that extraordinary government interventions are required for sci-fi scenarios.

Sources x.com