Bỏ qua đến nội dung chính
Back to home
AI 1 min read

NVIDIA Announces Cosmos 3: Omnimodal World Models for Physical AI

NVIDIA Cosmos 3 is a unified world model capable of understanding and generating language, images, video, audio, and actions for robotics.

Tier 1 · sources 90% confidence Reviewed
Sources x.com

NVIDIA has officially announced Cosmos 3, a next-generation Omnimodal World Model specifically designed for Physical AI. This is a massive leap toward creating robots capable of understanding the world as humans do.

Context

Physical AI requires models to not only process digital data but also understand the physical laws of the real environment. Cosmos 3 is built to serve as the "brain" for robots, helping them predict scenarios and execute precise actions in 3D space.

Key Developments

The standout feature of Cosmos 3 lies in its unified architecture. It can simultaneously process and generate various data types: from text, images, and video to audio and action control signals. Its world modeling capability allows robots to "visualize" the results of an action before actually performing it, minimizing errors and enhancing safety.

Why It Matters

Cosmos 3 marks a shift from specialized AI models toward generalized world models for robotics. NVIDIA's integration of action generation into the same model with vision and language hints at a future where robots can learn from video and interact with humans more naturally than ever before.