LIVE
Loading prices...
View All

VBVR framework announced, turns video models into puzzle solvers

VBVR framework announced, turns video models into puzzle solvers
▶️
▶️ YouTube
▶️ YouTube
Open in YouTube

Researchers have introduced VBVR, also known as the Very Big Video Reasoning suite. This is a new framework that lets text-to-video models actually reason about what they see instead of just generating pretty footage.

Built on top of the open-source Open-Sora “one” model, VBVR can follow visual instructions. As such, it can be used to solve puzzles and track objects directly inside a video scene.

In demos, VBVR can identify and circle a specific character, solve simple logic and shape puzzles, and track an agent. It moves around collecting dots in a grid. The team also built a dedicated video reasoning benchmark, where VBVR hit around 68.5 percent accuracy.

For reference, many existing models stayed below 50 percent. To push the ecosystem forward, they released both the VBVR framework and a one-million-example video reasoning dataset. However, it weighs in at roughly 310 GB.

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

170 posts

Comments

Your contact info is private.

No comments yet. Be the first to share your thoughts!