Nvidia launches Cosmos 3, an open AI world model for robots, self-driving cars, and physical AI
Nvidia on Tuesday unveiled Cosmos 3, a new open-source AI model that combines physical reasoning, world generation, and action generation within a single system.
A robot can identify a coffee mug. A self-driving car can recognize a pedestrian. Yet knowing what is likely to happen next, predicting how objects will move, and deciding how to respond in real time remain some of the hardest problems in artificial intelligence.
Cosmos 3 is Nvidia’s latest attempt to tackle that challenge. The company is releasing the model alongside training scripts, datasets, deployment tools, and inference services to accelerate the development of physical AI systems.
“NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning, world generation, and action generation within a single open model,” Nvidia said in a blog post.
The launch marks Nvidia’s latest push beyond AI infrastructure and into the software layer that could shape the next generation of robots, autonomous vehicles, warehouse systems, and intelligent environments.
One model instead of many
Previous versions of Cosmos split physical reasoning, world generation, and scene control across separate models and workflows.
Cosmos 3 brings those capabilities together under a single architecture Nvidia calls a Mixture-of-Transformers, or MoT.
At the center of the system are two components.

The first is a reasoner tower, a vision-language model that analyzes images, video, and text to interpret motion, object interactions, and physical context. Nvidia describes it as the part of the system that thinks about what is happening before generation begins.
The second is a generator tower that creates future observations and action sequences. Using diffusion-based techniques, it generates videos and actions based on the physical context identified by the reasoning system.
The result is a model that can analyze a scene, predict what might happen next, and generate actions from that prediction without relying on separate pipelines.
That matters for developers building physical AI systems, where coordinating multiple models often introduces latency, engineering overhead, and added complexity.
Built for robots and autonomous systems
Nvidia is releasing Cosmos 3 in two versions.
Cosmos 3 Nano contains 16 billion parameters and targets workstation-class hardware, including the NVIDIA RTX PRO 6000 GPU. Nvidia says the model is suited for robotics inference and other real-time physical AI applications.
Cosmos 3 Super, a larger 64-billion-parameter model, focuses on maximum performance and is intended for deployment in datacenter environments powered by Hopper and Blackwell GPUs. Nvidia positions it for synthetic data generation and advanced physical reasoning tasks.
The model supports a wide range of inputs and outputs, including text, images, videos, and action sequences. That flexibility allows developers to use Cosmos 3 for tasks ranging from robot learning and autonomous driving to synthetic video generation and warehouse monitoring.

Overview of Cosmos 3
Open datasets join the release
The release extends beyond model weights.
Nvidia is publishing six synthetic datasets covering robotics, physical interactions, spatial reasoning, digital humans, autonomous driving scenarios, and warehouse operations.

Examples from the Spatial Reasoning dataset
The datasets can be used to post-train Cosmos 3 or serve as training resources for other physical AI systems.
Nvidia is betting that open access to both models and data will accelerate development across industries that increasingly depend on machine perception and decision-making in real-world environments.

Measuring whether AI actually understands physics
One challenge facing AI researchers is determining whether a model genuinely understands physical behavior or is simply producing convincing outputs.
To address that problem, Nvidia created a benchmark called the Cosmos Human Evaluation (HUE).
The framework evaluates generated videos using binary fact-checking questions that examine semantic alignment, physical laws, geometric reasoning, and visual quality. Rather than relying on broad subjective ratings, HUE breaks videos into individual facts that can be verified by human reviewers.
According to Nvidia, the benchmark covers seven physical AI domains, including robotics, autonomous vehicles, and physics-based scenarios.
The company has publicly released the evaluation framework on Hugging Face.
Benchmark results
Nvidia says Cosmos 3 currently ranks among the strongest open-source models across several physical AI benchmarks.
The company highlighted leading results on PAI-Bench, Physics-IQ, RoboLab, and R-Bench, which measure video-generation quality, robotics performance, and physical-reasoning capabilities.
On Artificial Analysis leaderboards, Nvidia says Cosmos 3 currently ranks as the top-performing open-source model for both text-to-image and image-to-video generation.
Independent validation from researchers and developers will likely determine how those claims hold up over time, particularly as competing physical AI models emerge from major labs and startups.
Training and deployment
Nvidia is releasing post-training recipes that allow developers to adapt Cosmos 3 to specific industries, datasets, and robotic systems.
The workflows support supervised fine-tuning for video generation tasks and action-focused training for robotics applications, including forward and inverse dynamics and policy generation.
For deployment, Nvidia is making Cosmos 3 available through NVIDIA NIM microservices. The services package optimizes inference runtimes and supports quantization methods such as FP8 and NVFP4, which Nvidia says can deliver up to a twofold speed improvement compared with BF16 models.
The company has released the Cosmos 3 Reasoner NIM immediately, with the Generator NIM expected later.
The bigger picture
The race to build physical AI is becoming one of the most important contests in technology.
Large language models transformed how machines work with text. Physical AI aims to do something far more ambitious: teach machines how the real world behaves.
That challenge sits at the center of robotics, autonomous transportation, industrial automation, and smart environments.
With Cosmos 3, Nvidia is making a play to become more than the company supplying the chips behind AI. It wants to provide the models, datasets, tools, and infrastructure that teach machines how to perceive, predict, and act in physical spaces.
If that vision takes hold, the next AI breakthrough may not happen on a screen. It may happen in a warehouse, on a factory floor, or behind the wheel of an autonomous vehicle.
Watch the video of the launch below.

