Unified World Models: Coupling Video and Action Diffusion
for Pretraining on Large Robotic Datasets
- combine video and action diffusion in one transformer, can be train from robot trajectories and action-free videos.
- represents a policy, a forward dynamics model, an inverse dynamics model, and a video prediction model in a unified framework.
#RSS2025 #diffusion
X overview, Project Website
for Pretraining on Large Robotic Datasets
- combine video and action diffusion in one transformer, can be train from robot trajectories and action-free videos.
- represents a policy, a forward dynamics model, an inverse dynamics model, and a video prediction model in a unified framework.
#RSS2025 #diffusion
X overview, Project Website