Tencent’s InteractAvatar makes talking AIs actually grab and move objects

Duane Villanueva • Mar 6, 2026 • 2 min read

Tencent has introduced InteractAvatar, an AI avatar system that goes beyond lip-syncing to let digital humans pick up, move, and interact with objects in a scene from a simple text prompt. In demos, characters can put on headphones, check a smartphone, lift a bag, or gently touch a plush toy while speaking, with natural hand and body motion.

The system understands complex multi-step instructions and timing: creators can chain actions with explicit timestamps, such as touching an apple from 0–4 seconds, moving it between 12–16 seconds, then picking it up at 16–20 seconds. It also handles a wide range of gestures like OK signs, thumbs up, arm crossing, heart shapes, clapping and tracks hand poses accurately.

Compared to other animation tools like OmniAvatar, InteractAvatar is currently the only one shown manipulating scene objects instead of just animating a talking head or idle body. Under the hood it builds on the 1.2.2 base video model, with a public project page and GitHub repository that includes local setup instructions.

For creators and brands, this opens the door to more interactive spokesperson videos and product demos, without having to keyframe every hand movement.

Duane Villanueva

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

189 posts

Tencent’s InteractAvatar makes talking AIs actually grab and move objects

Comments

Cancel reply