Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot Nvidia used…

By AI Maestro June 1, 2026 3 min read
Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot


Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to launch a series of models for robots, autonomous vehicles, and video systems. The centerpieces are the new world model Cosmos 3, a significantly scaled-up driving model called Alpamayo 2 Super, and an open reference platform for humanoid robots.

Cosmos 3 is Nvidia’s next version of its open “omnimodel,” which processes text, images, video, ambient audio, and action data in a single system. Developers building robots, autonomous vehicles, and video surveillance systems can use it to generate synthetic training data, interpret scenes, and predict future world states without having to painstakingly recreate those situations in the real world.

Nvidia names three use cases. As a vision-language model, Cosmos 3 analyzes video, for example to detect traffic anomalies in smart cities, as partner Linker Vision is already doing.

As a world model, it generates photorealistic video sequences of rare situations like near-misses or unusual object arrangements in a warehouse.

And as the basis for so-called world-action models, it produces numerical motion data like joint angles or gripper positions that robots use to learn tasks such as picking and placing, as industrial partner Agile Robots demonstrates.

The architecture uses a mixture-of-transformers approach: one reasoning transformer analyzes a scene, then a second generation transformer produces videos, descriptions, or motion trajectories from that analysis. Training data included billions of examples spanning text, images, video, audio, and action data. Nvidia offers three variants: Cosmos 3 Super delivers the best current quality, Nano is built for fast inference, and a forthcoming Edge model targets real-time operation on embedded systems. The models are available under the OpenMDW-1.1 license on Hugging Face and GitHub.

The release comes alongside the “Cosmos Coalition,” a partner group that includes Black Forest Labs, Runway, LTX, Generalist, Agile Robots, and Skild AI. In practice, it’s an alliance that uses Nvidia’s DGX Cloud training infrastructure and contributes models and data in return.

Alpamayo 2 Super is meant to be a teacher model for robotaxis

The Alpamayo family is Nvidia’s open model series for Level 4 autonomous driving, meaning robotaxis that operate without a human driver within a defined area. The models take in camera images, derive a driving decision, and output a concrete trajectory. Previous versions included Alpamayo 1 Nano and 1.5 Nano, each with ten billion parameters.

Alpamayo 2 Super replaces that generation at the top end with 32 billion parameters. The jump is supposed to improve spatial understanding and handling of rare situations. New is the output of so-called meta-actions like “lane change,” “stop,” or “yield,” which the model delivers to a downstream planner alongside the trajectory. Perception now also covers the entire vehicle rather than just the front cameras. Every decision comes with a “chain of causation,” a textual reasoning chain that Nvidia says is designed for safety documentation and regulatory review. This brings a familiar question from the AI alignment debate into the driving safety discussion: how reliably do these reasoning traces actually reflect what’s happening inside the network?

Nvidia says the large model is intended as a teacher model. Manufacturers are supposed to use it to distill smaller models that then run on the vehicle-grade Drive AGX Thor chip. Nvidia is also releasing AlpaGym, an open-source framework for closed-loop reinforcement learning in simulation, and OmniDreams, a generative model for rare traffic scenarios. Nvidia doesn’t provide any reliable external comparison numbers, for example against the stacks from Waymo or Tesla. Code and weights are expected to appear on GitHub and Hugging Face this summer.

An open humanoid robot built on a Unitree chassis

With the Isaac GR00T Reference Humanoid Robot, Nvidia is also releasing a reference platform for academic research in humanoid robotics. The roughly six-foot-tall robot is based on the Unitree H2 Plus chassis, paired with tactile five-finger hands from Sharpa, and powered by the Jetson AGX Thor T5000 with 2,070 FP4 teraflops. The system has 75 degrees of freedom in total. On the software side, it runs the Isaac GR00T stack, which covers teleoperation, simulation in Isaac Sim, foundation models, and ROS middleware.

Nvidia isn’t selling the robot itself. Instead, it points to Unitree, which plans to offer the hardware by late 2026. Research partners include Ai2, ETH Zurich, the Stanford Robotics Center, and the UC San Diego ARC Lab. In practice, Nvidia is trying to standardize a hardware-software bundle that deepens the robotics research community’s reliance on Jetson chips and Isaac tooling.

Subscribe now

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top