Unified World Models: Coupling Video and Action Diffusionfor Pretraining on Large Robotic Datasets- combine video and action diffusion in one transformer, can be train from robot trajectories and action-free videos.- represents a policy, a forward dynamics model, an inverse dynamics model, and a video prediction model in a unified framework.#RSS2025 #diffusionX overview, Project Website

Unified World Models: Coupling Video and Action Diffusion
for Pretraining on Large Robotic Datasets

- combine video and action diffusion in one transformer, can be train from robot trajectories and action-free videos.

- represents a policy, a forward dynamics model, an inverse dynamics model, and a video prediction model in a unified framework.

#RSS2025 #diffusion
X overview, Project Website

🧵 Thread • FixupX

Chuning Zhu (@chuning_zhu)

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)