Progressive Autoregressive Video Diffusion Models

Chunked Frames prevents divergence. Overlapped Conditioning prevents chunk-to-chunk discontinuity.

PA-M full
(with Chunked Frames and Overlapped Conditioning)

PA-M with Chunked Frames

PA-M without both techniques

PA-O full
(with Chunked Frames and Overlapped Conditioning)

PA-O with Chunked Frames

Variable Length training and inference enables the model to generate videos of arbitrary lengths, as shown at the 1st and 59th seconds.

PA-M full
(with Variable Length training and inference)

PA-M without Variable Length training
but with Variable Length inference