Genie 3 Lets Users Prompt AI-generated Playable Environments

Google DeepMind has introduced Genie 3, a new AI world model, capable of generating 3D environments for real-time interaction by users and AI agents. This iteration enhances sustained user interaction and improves object memory within these simulated worlds.

World models are AI systems designed to simulate environments for purposes including education, entertainment, and training for robots or AI agents. These systems generate interactive spaces from user prompts, differing from handcrafted 3D assets by creating environments entirely through AI. Google has invested significantly in this area, previously demonstrating Genie 2 in December, which could generate interactive worlds from images. A dedicated world models team, co-led by a former lead from OpenAI’s Sora video generation tool, leads these efforts.

Previous models exhibited limitations. Genie 2 worlds, for instance, were playable for a maximum of one minute. Earlier interactive video technologies have shown environments that distort when viewed or re-viewed.

Genie 3 addresses some of these drawbacks. Users can generate worlds via prompts that support “a few” minutes of continuous interaction, an increase from the 10–20 seconds offered by Genie 2, as stated in a blog post by Google. Genie 3 can maintain spaces in visual memory for approximately one minute, ensuring elements like paint on a wall or writing on a chalkboard remain in place upon re-observation. The generated worlds will feature a 720p resolution and operate at 24 frames per second.

DeepMind is incorporating “promptable world events” into Genie 3. Users will be able to alter weather conditions or introduce new characters within a generated world through prompts.

Genie 3 is currently offered as “a limited research preview,” accessible to “a small cohort of academics and creators.” This controlled release aims to facilitate risk assessment and mitigation strategies by developers, according to Google. Restrictions include limited user interaction methods and that legible text is “often only generated when provided in the input world description.” Google has stated it is “exploring” broader dissemination to “additional testers” in the future.