Microsoft Releases an AI-Generated Playable Quake II Game Demo
Microsoft released an interactive real-time gameplay experience of Quake II in Copilot Labs last week. To build the artificial intelligence (AI) gameplay, the Redmond-based tech giant used its recently released Muse AI models and a new approach dubbed World and Human Action MaskGIT Model (WHAMM). The game demo is currently available as a research preview to everyone, and it comes with the world generation of the game and all the usual mechanics. Microsoft also listed several limitations in the gameplay of the AI-generated experience.
Microsoft's Quake II Gameplay Was Built on Muse AI
In a blog post, Microsoft researchers detailed the AI-generated gameplay and how they were able to build it. AI-powered 2D and 3D game generation has been an active area of interest for researchers as it tests the capability of the technology to generate real-time world environments and adjust it for different mechanics used by a human user. It is said to be a good way to see if AI models can be trained to take on real-world tasks by controlling robots as physical AI.
Notably, Quake II is a 1997 first-person shooter published by Microsoft-owned Activision. It is a 3D forward-scrolling level-based game with a diverse range of mechanics, including jumping, crouching, shooting, environment destruction, and camera movements. The game is available via Copilot Labs, and users can currently experience a single level for about two minutes using a controller or the keyboard.
Coming to the development process, the researchers said that they used Muse AI models and the World and Human Action Model (WHAM) to use the new WHAMM approach.

WHAMM Overview
Photo Credit: Microsoft
WHAMM is the successor to WHAM-1.6B, and can generate more than 10 frames per second, enabling real-time video generation. The gameplay's resolution output has been kept at 640×360 pixels. Microsoft says one of the key improvements in WHAMM's speed came from using the MaskGIT Mask Generative Image Transformer) setup instead of WHAM-1.6B, as the frame rates went from one frame per second to 10+.
MaskGIT setup allowed the researchers to generate all of the tokens for an image in a limited number of forward passes. With this, the AI model can produce predictions of each of the possible moves of a single masked image in real time, allowing for a smoother experience.
While the core gameplay is quite similar to the original game, Microsoft also listed several limitations with the current demo. Since the game environment is generated using AI, it is merely an approximation of the real world and not an identical replication. Enemy interaction sometimes leads to fuzzy image generations, and the combat can be incorrect.
WHAMM currently has a context window of 0.9 seconds (9 frames at 10fps). As a result, the model forgets about objects that go out of view for longer than this. Microsoft says this can give rise to scenarios where a user turns around and finds an entirely new area or looks a the sky and back down to find themselves be moved to a different part of the map. Further, the game also has a significant latency due to it being made available to everyone.
Tech