Stream DiffVSR has been unveiled as a diffusion-based video super-resolution model designed for low-latency online use. Unlike most methods to the task depending on multi-step denoising, this relies on past frames to fit real-time streaming scenarios.
The system combines a four-step distilled denoiser for faster inference and an auto-regressive temporal guidance module. The latter injects motion-aligned cues during denoising.
The system also includes a temporal-aware decoder with a temporal processor module to keep details and motion consistent across frames. Stream DiffVSR can process 720p frames in about 0.328 seconds on an RTX 4090 GPU.
In other words, it outperforms past diffusion-based methods on perceptual quality. For reference, the initial delay for other methods takes 4,600 seconds.
Researchers also claim that Stream DiffVSR achieves the lowest latency yet for diffusion-based video super-resolution. As such, it’s a viable alternative for real-time deployment.
Comments
No comments yet. Be the first to share your thoughts!