Unified World Models: Coupling Video and Action Diffusion
for Pretraining on Large Robotic Datasets

- combine video and action diffusion in one transformer, can be train from robot trajectories and action-free videos.

- represents a policy, a forward dynamics model, an inverse dynamics model, and a video prediction model in a unified framework.

#RSS2025 #diffusion
X overview, Project Website
 
 
Back to Top