Windows 12 May Feature Built-in AI At Its Foundation

Microsoft’s Pavan Davuluri discussed the future of Windows in a new video interview, stating the next version will be “more ambient, pervasive, and multi-modal” as AI redefines user interaction with computers.

Pavan Davuluri, Microsoft CVP and Windows boss, recently outlined a vision for the future of the Windows operating system in a newly released video interview. Davuluri detailed how artificial intelligence (AI) will fundamentally alter the desktop interface, leading to an ambient and multi-modal computing experience. This discussion provided insights into Microsoft’s strategic direction for its flagship platform.

During the interview, Davuluri addressed the transformative impact of AI on human-computer interaction. He stated, “I think we will see computing become more ambient, more pervasive, continue to span form factors, and certainly become more multi-modal in the arc of time.” Davuluri further elaborated on the evolving nature of input methods, noting, “I think experience diversity is the next space where we will continue to see voice becoming more important. Fundamentally, the concept that your computer can actually look at your screen and is context aware is going to become an important modality for us going forward.” This indicates a shift towards systems that understand environmental and user context through advanced AI capabilities.

Microsoft has previously indicated a strategic emphasis on voice as a primary input method for future Windows iterations. A “Windows 2030 Vision” video, released a week prior to Davuluri’s interview, featured Microsoft’s CVP of Enterprise & Security discussing similar concepts regarding the future of the operating system. This earlier communication aligns with Davuluri’s recent statements, reinforcing the company’s commitment to integrating voice as a core interaction modality.

The forthcoming version of Windows is expected to elevate voice to a first-class input method, complementing traditional mouse and keyboard interfaces. Users will reportedly be able to interact with Windows using natural language, with the operating system designed to comprehend user intent based on the content displayed on the screen. This integration aims to create a more intuitive and seamless user experience, allowing for ambient communication with the OS.

Davuluri also suggested that the visual appearance and interaction paradigm of Windows are subject to significant change due to the integration of agentic AI. He observed, “I think what human interfaces look like today and what they look like in five years from now is one big area of thrust for us that Windows continues to evolve. The operating system is increasingly agentic and multi-modal.” This statement highlights a substantial investment and ongoing development effort in evolving the operating system’s core design and user interaction models.

The cloud infrastructure is identified as a critical enabler for these advanced Windows experiences. Davuluri explained, “Compute will become pervasive, as in Windows experiences are going to use a combination of capabilities that are local and that are in the cloud. I think it’s our responsibility to make sure they’re seamless to our customers.” This indicates a hybrid computing model where local device capabilities are seamlessly integrated with cloud-based processing to deliver enhanced functionality.

Microsoft’s strategic direction suggests a fundamental shift in how AI assistants are integrated into operating systems. Current AI assistants, such as Copilot on Windows, Gemini on Android, or Siri on macOS, typically function as applications or overlays operating on top of existing OS platforms. Microsoft appears to be preparing to introduce an operating system where AI is intrinsically woven into its foundational architecture, rather than existing as a separate layer. This transformation is anticipated within the next five years, potentially coinciding with the release of Windows 12. Multiple high-level Microsoft executives have alluded to this significant evolution, framing it as a major shift in computing driven by AI advancements.

While the concept of voice becoming a primary input method for PCs may present an adjustment for some users, the integration of agentic AI and the OS’s ability to comprehend user intent and natural language are expected to make this interaction feel more intuitive. This evolution extends beyond Microsoft, with Apple reportedly developing a similar voice-centric feature for iOS 26. This upcoming feature for iPhone users is rumored to enable application navigation solely through verbal commands, articulating user intent directly to the device.

On the Windows platform, voice input is likely to augment, rather than replace, established input methods. The system will incorporate three primary modes of interaction: typing, touch/mouse, and voice. While voice input may not be mandatory for task completion, its availability is expected to streamline user workflows. However, the reliance on extensive personal user data to optimize these AI-driven experiences raises privacy considerations. Davuluri’s acknowledgment of a necessary balance between local and cloud compute for these experiences suggests that these privacy concerns will be a significant factor in their implementation and public reception.