AudioX Model Announced, Handles SFX, Music, And Video-to-audio In One

AudioX model announced, handles SFX, music, and video-to-audio in one

Duane Villanueva • Feb 26, 2026 • 1 min read

AudioX is a new “unified” generative audio model that can produce sound effects and music from text, images, and video, as well as perform advanced tasks like audio inpainting and extension. From a simple prompt, it can synthesize scenes such as “thunder and rain during a sad piano solo” or “a machine gun fires twice followed by silence, then waves,” accurately following timing and event order.

The model also supports text‑to‑music for background tracks, though current musical quality is described as serviceable rather than state‑of‑the‑art. More importantly, AudioX can take video as input and generate synchronized soundtracks that respond to camera cuts and scene changes, demonstrating an understanding of temporal structure in visual content.

On the restoration side, AudioX can fill missing segments in speech or music, or extend an existing clip with stylistically consistent continuation. Benchmarks show it outperforming competing systems both in capabilities breadth and performance across modalities. The weights, totaling under 6 GB, are available in a GitHub repo with instructions for local deployment, making it accessible to developers running on consumer GPUs.

Duane Villanueva

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

169 posts

AudioX model announced, handles SFX, music, and video-to-audio in one

Comments

Cancel reply