ByteDance’s FSVideo generates 720p clips in seconds on dual H100 GPUs

Duane Villanueva • Mar 10, 2026 • 1 min read

ByteDance has unveiled FSVideo (Fast Speed Video), a text-to-video model focused on combining high visual quality with rapid inference. Demos show realistic 1280×720 clips across various scenes, including cinematic product shots and stylized vertical content, with sharp details and smooth motion.

According to the paper, FSVideo can generate 5‑second videos on two Nvidia H100 GPUs in about 18 seconds. While consumer GPUs will be slower, this is still competitive given the quality level shown in samples, which appears on par with today’s top systems.

The model supports both horizontal and vertical formats, making it a natural fit for TikTok-style content and mobile-first platforms. Granted, this is unsurprising given ByteDance’s background.

However, at this stage only a technical paper and demo clips are available; there’s no public code or weights, and no clear timeline for open-sourcing. For now, FSVideo is more of a signal of what ByteDance is building for internal creative tools than a model most creators can run themselves.

Duane Villanueva

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

199 posts

ByteDance’s FSVideo generates 720p clips in seconds on dual H100 GPUs

Comments

Cancel reply