Jump to content




Inside Microsoft’s quest to make Windows 11’s AI irresistible

Featured Replies

rssImage-aa2ecdea78ef755e0dfd8c1e158f2e40.webp

People remember many things about Windows 95, which turned 30 a couple of months ago. There were its signature new features, such as the Start Button, taskbar, and long file names. The launch event—hosted by Jay Leno—at Microsoft’s campus. The TV commercials with the Rolling Stones’ “Start Me Up.” The crowds of PC users so eager to get their hands on the upgrade that they descended on computer stores at midnight.

Here’s a fact about Windows 95 that isn’t exactly iconic: It was the first voice-enabled version of Microsoft’s operating system. A collection of technologies known as the Microsoft Speech API (SAPI) provided support for speech recognition and synthesis, letting developers create apps that could speak and be spoken to. But SAPI didn’t go on to revolutionize how people used Microsoft products. Neither did any of the numerous other voice-centric technologies it has developed over the decades, such as its 1990s Auto PC car platform and the ill-fated Siri counterpart Cortana.

“It’s kind of amazing to think about it, really,” muses Microsoft executive VP and consumer CMO Yusuf Mehdi. “It’s probably been 30, 40 years since there was a new input mechanism for your PC. We had the keyboard, and then we introduced the mouse. There has not been another input mechanism.”

Like many of the people presently charting a future for Windows, Mehdi has seen much of that history firsthand as a Microsoft employee—34 years of it, in his case, and though he’s glossing over touchscreens and styluses—both of which are part of Microsoft’s own Surface line and have their devotees—his overarching point stands. For all the ways Windows has evolved, the basic means of interacting with it have remained enduringly resistant to change.

i-2-91421552-microsofts-new-strategy-for
Yusuf Mehdi

Once again, Microsoft is trying to overcome that. The company is announcing a Windows 11 update that lets you seek help from its Copilot AI by talking to it, with the response also coming in spoken form. Known as Copilot Voice, the feature leverages Copilot Vision, a technology—first previewed a year ago—that can scan the contents of your screen to suss out what you’re working on, whether you’re perusing a social media feed in your browser, crafting a business proposal in Word, or studying for an exam.

If voice input and output provide the interface for this new Windows experience, Copilot Vision is the glue that holds it together. “It doesn’t require Copilot to have programmatic understanding of every app in the world,” says Pavan Davuluri, Microsoft’s president of Windows + Devices, who will soon mark his 25th anniversary at the company. “It just sees what you allow it to see and infers the world. It helps you with the task that you’re probably engaged in at that point in time.”

Generative AI—including technologies Microsoft gets from its partner OpenAI—makes that possible. As corporate VP of Windows experiences (and 24-year Microsoft veteran—see a pattern here?) Navjot Virk puts it, “The point is not just that you can talk to your PC, the point is that the PC now understands you.”

But making AI make sense in Windows is only partially about the technology performing as promised. In a world full of AI features that can feel like needy, uninvited distractions, Microsoft wanted this one to be welcome. Users must explicitly opt into Copilot Voice and Copilot Vision and use the wake word “Hey Copilot” to summon them. And even then, they’re designed to be unobtrusive complements to the familiar keyboard-and-mouse experience.

i-1-91421552-microsofts-new-strategy-for

“People know what they want to do,” says Virk. “We should make sure we get out of their way, but give them the tools that they will use.”

That’s a sharply different vision from the one Microsoft rolled out at a May 2024 event with the lofty tagline “A new AI era begins.” The era in question involved a new class of laptop, called Copilot+ PCs, that packed powerful Qualcomm Snapdragon chips. Yet they were short on AI-related features compelling enough to justify buying a new computer.

This time, Microsoft is concentrating on making AI available and appealing to all Windows 11 users, regardless of the machine they’ve got. The question the company asked itself, Mehdi says, is “What does a real AI PC look like in this next phase?”

In some ways, its answers are utterly straightforward. Even so, putting the real in “real AI PC” will keep it busy for years.

The ultimate AI proving ground

For all its mundane workaday ubiquity, Windows is a demanding proving ground for AI. According to Microsoft, the operating system is currently running on 1.6 billion devices, a figure that includes both Windows 11 and the theoretically moribund Windows 10. Sure, some of its users are early adopters eager to be wowed by the latest technology, even in imperfect form. But many more just want Windows to be a reliable, surprise-free tool to accomplish daily tasks. Their bar for finding AI palatable isn’t lower than that of the enthusiasts—it’s higher.

Those 1.6 billion devices also reflect an endless array of manufacturers, models, and configurations—a formidable challenge when it comes to deploying a voice interface that consistently works well. Not that long ago, PC-based voice-controlled assistants tended to interrupt themselves and otherwise fail to engage with the world in ways that were fluid and natural, notes Microsoft technical fellow Stevie Bathiche (26 years at the company). “That’s because [they] didn’t have a high-quality audio pipeline,” he says. “Now that’s solved.”

i-3-91421552-microsofts-new-strategy-for
Pavan Davuluri

Microsoft’s solution borrows from work it originally did for Cortana and Teams and involves technologies such as beam forming, which help a PC block out irrelevant ambient noise. That helps even with basic Copilot Voice features that don’t sound like huge deals in themselves: the “Hey Copilot” wake word and ability to say “Goodbye” to conclude an AI session. But the most challenging part was what came in between: getting the AI to correctly handle everyday tasks as users might phrase them, regardless of their degree of AI savvy.

With consumer AI in its typical current form, “If you know how to craft that perfect prompt and go into super detail, you can get a lot of bang out of it,” says Virk. “But how do we make this superpower accessible to every single user of Windows?”

In several demos, the company showed me Copilot responding to briefly expressed spoken requests. In one, it explained how to disentangle multiple Spotify listeners’ data so the service’s year-end Wrapped summary wouldn’t be a meaningless mishmash. It also made style suggestions based on a Pinterest feed, defined physics concepts mentioned in class notes, and did the math to adjust the ingredients in a handwritten recipe to produce a larger batch. The closest it got to showing off was when it aided a songwriting example by humming a funk riff in G minor.

All of this emphasizes the simple, practical, and broadly applicable. One reason why: The stinging reaction to Windows Recall, a feature Microsoft announced at its May 2024 event. By capturing an ongoing stream of screenshots, Recall gave the operating system a memory. The idea was that users would find value in AI being able to scour their past activity in intimate detail. But the technology was invasive, turned on by default, and unencrypted. After critics called it a privacy nightmare, Microsoft took Recall back to the drawing board and didn’t release it for almost a year.

Naturally, the company now says it regards the whole kerfuffle as a teachable moment. “We have taken those learnings and really applied them and internalized them to everything new that we have,” says Virk. “First and foremost, the discussion that happens on the team is, ‘How will somebody understand the value of this? Will they be comfortable? Will they feel like they have control? Do they always know what is happening?’ Transparency is an important core tenet for our experiences.”

Only some of those experiences are rolling out to all Windows 11 users immediately. Additional ones will be available in test form to users who subscribe to the Windows Insider early-access program. Those include features called Connectors that hook Copilot into apps such as Outlook and OneDrive, giving it far more access to your data than Copilot Vision can divine by analyzing the screen. (Yes, Microsoft says Connectors will be available to third-party developers, too.) Connectors are crucial to Copilot starting to get more agentic—able to perform complex tasks on the user’s behalf with some measure of autonomy.

Other purveyors of AI are developing similar technologies. For example, OpenAI already has a ChatGPT agent (known as Agent) and integrations (also called Connectors). By building this sort of AI directly into Windows, which already serves as a hub for so many people’s work, Microsoft has the opportunity to make it particularly powerful. But as AI works more independently and gains access to additional data, the potential for security and privacy issues rises. Chastened by its Recall misfire, Microsoft emphasizes that its agent-related features are opt-in and engineered to receive only the access they need.

Even before these features reach general availability, Windows is using multiple AI models in an agentic manner below the surface. As the operating system responds to a user’s request, “The big model creates the plan and the reasoning behind it,” explains Bathiche. “It says, ‘You do this, do this, do this.’ The small model is tuned to essentially say, ‘Yeah, let me take that instruction and translate it to what that actually means on the screen.’” That division of labor hints at a future when Windows, and computing in general, get atomized into bits of software negotiating with each other—a scenario that’s been predicted for decades and is only now going beyond the theoretical.

Windows Insider members will also be the first to gain access to Ask Copilot, a new feature that puts Copilot directly on the taskbar, allowing them to initiate a typed AI session without firing up the existing Copilot app. Like “Hey, Copilot,” that may not sound like a huge whoop. But it’s key to Microsoft’s long-term goal of letting Windows call on AI in whatever way they prefer at any given moment.

“You can get going with Copilot straight out of the gate,” says Davuluri. “And it can be chat, it can be voice, it can be vision. It can be any combination of them.” 

The road to Jarvis

Ultimately, it’s impossible to ponder Windows’ future except in the context of its first 40 years. The graphical computing environment—not yet a full-blown operating system—shipped in 1985 and struggled at first. Only with 1990’s Windows 3.0 did it become a hit. Then new trends, such as multimedia and the web, only strengthened its position.

In recent years, Windows—for all the enormity of its user base—has maintained a low profile. Indeed, Microsoft CEO Satya Nadella is justly admired for reimagining the company for an age that doesn’t revolve around Windows or any other desktop operating system. Had it clung to its past rather than broadened its horizons, it likely wouldn’t be the world’s second most valuable company today.

Could voice and AI put Windows back in the spotlight? Mehdi doesn’t mention Apple’s recent travails in AI, but he’s clear that he sees an opportunity for Microsoft to bound forward more quickly than its eternal competitor. ”We’re going to have an open window,” he says. “And Apple is not going to be in this window for quite some time.”

Thinking ahead over the next decade, Mehdi told me, Microsoft would love to turn Copilot into the real-world equivalent of Tony Stark’s ultracapable AI butler Jarvis. Still, he and the Microsoft executives I talked to mostly kept the hype in check. None of them suggested that voice might totally supersede today’s graphical interface in the way Windows once replaced MS-DOS’s text-based command line. Microsoft Jarvis, should it come to exist, will likely still support keyboard and mouse input—just like Windows 1.0.

“We think this is the next interface because it’s additive,” stresses corporate VP of design and research for Windows + Devices Marcus Ash, who has been at Microsoft since 1999 (not counting a brief detour at Stripe) and was part of the team that created Cortana. “It gives you more things that you can do. But you can also go back to the way that you use things if that’s comfortable for you.”

Which is not to say Microsoft won’t make every effort to make the case for Windows’ latest attempt to bake in voice technology. That undertaking will include a TV campaign showing the new features in action. “We’ve not advertised Windows in that kind of fashion in a while,” says Mehdi. “So we do have confidence in what we’ve got here.“

Once upon a time, Microsoft signaled that a Windows update mattered by rebranding it: Windows 95, Windows 98, Windows Me, Windows 2000, Windows XP, Windows Vista, Windows 7. Not this time. Four-year-old Windows 11 is still Windows 11—additional evidence the company is trying to err on the side of underselling what it’s created.

“Historically, we’ve changed names,” says Mehdi. ”Pavan and I were like, ‘Shoot, should we have [called it] Windows 12 or Windows 20 or something?’ We didn’t even think about it. We were spending our whole time working on the product. But it has that magnitude. And it’s obviously all with the backdrop of what’s happening in the world of AI.” If this new voice-enabled operating system wins hearts, it will be because its benefits speak for themselves.

View the full article





Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.