AnchorWeave is a new open-source system that generates interactive 3D world videos from a single starting frame. Users navigate the renders with keyboard controls as if they were inside a game.
Each run currently outputs around 81 frames, which is only a few seconds. However, creators can chain generations to produce longer, explorable clips.
Compared with earlier interactive world generators, AnchorWeave focuses on scene memory and coherence. This preserves object layout even when the camera looks away and returns.
It works in both first‑person and third‑person views, able to follow a character moving through a 3D environment. The system also achieves this while maintaining consistent geometry and detail.
Under the hood, the model relies on ‘local geometric memories’ that break scenes into smaller parts. There’s also a multi‑anchor weaving controller that stitches those parts into a smooth, globally coherent video.
Early qualitative comparisons show results that closely track ground truth footage and outperform prior methods in realism and consistency. The team has released code via GitHub, enabling local deployment for users with compatible GPUs.
The first version is built on top of the older CogVideo X generator at the time of writing. However, the researchers plan a future Qwen-based implementation that should further improve quality.

Comments
No comments yet. Be the first to share your thoughts!