Microsoft is expanding its AI capabilities with the launch of two new models that have been trained entirely in-house. MAI-Voice-1, the company’s first natural speech generation model, and MAI-1-preview, its first end-to-end foundation text model, are both part of Microsoft’s growing AI portfolio.
MAI-Voice-1 is currently being utilized in features like Copilot Daily and Podcasts, while MAI-1-preview is available for public testing on LMArena and will soon be previewed in select Copilot scenarios.
Mustafa Suleyman, the leader of Microsoft’s AI division, explained in an interview with Semafor that the new models were designed with a focus on efficiency and cost-effectiveness. MAI-Voice-1 operates on a single GPU, while MAI-1-preview was trained using approximately 15,000 Nvidia H-100 GPUs. In comparison, other models, like xAI’s Grok, required over 100,000 GPUs for training. Suleyman emphasized that training these models efficiently relies on carefully selecting the right data to ensure that every computational step adds value.
While Microsoft’s Copilot still primarily relies on OpenAI’s GPT models, the introduction of its own models shows the tech giant’s ambitions to become an independent player in the AI space. Although it will take time to match the capabilities of the leading AI companies, Suleyman shared that Microsoft has a five-year roadmap and is investing heavily to build its own AI solutions.
Despite concerns about the potential for an AI bubble, Microsoft is optimistic about the future. “We have big ambitions for where we go next. Not only will we pursue further advances, but we believe orchestrating a range of specialized models will unlock immense value across various user needs and use cases,” said the company in its announcement.
Image Source: Google | Image Credit: Respective Owner