NVIDIA Cosmos 3 has officially taken the lead on the Artificial Analysis leaderboards for open weights models. The model family now ranks first in two critical categories: Text-to-Image and Image-to-Video.
Designed as "Omnimodal World Models" for Physical AI, Cosmos 3 represents a significant step forward in unifying various modalities. It integrates language, images, video, audio, and action sequences within a single architecture. This capability is expected to drive advancements in robotics and AI systems that require a deep understanding of the physical world and real-world interactions.