Microsoft has built one of the top five publicly disclosed supercomputers in the world, making new infrastructure available in Azure to train extremely large artificial intelligence models.
Built in collaboration with and exclusively for OpenAI, the supercomputer hosted in Azure was designed specifically to train the OpenAI company’s AI models. It represents a key milestone in a partnership to jointly create new supercomputing technologies in Azure.
It’s also a first step toward making the next generation of very large AI models and the infrastructure needed to train them available as a platform for other organisations and developers to build upon.
Microsoft Chief Technical Officer Kevin Scott explained that potential benefits extend far beyond narrow advances in one type of AI model and added that it was possible to see new applications in the future that are challenging to comprehend at the moment.
A new class of multitasking AI models
A new class of models developed by the AI research community has proven that some of those small tasks of yesterday’s models can be performed better by a single massive model which can deeply absorb the nuances of language, grammar, knowledge, concepts and context that it can excel at multiple tasks.
As part of a companywide AI at Scale initiative, Microsoft has developed its own family of large AI models, the Microsoft Turing models, which it has used to improve many different language understanding tasks across Bing, Office, Dynamics and other productivity products.
The goal, Microsoft says, is to make its large AI models, training optimization tools and supercomputing resources available through Azure AI services and GitHub so that the power of AI at scale can be leveraged. The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server.
According to OpenAI CEO Sam Altman, Microsoft was able to build their dream system. He also added that OpenAI’s goal is not just to pursue research breakthroughs but also to engineer and develop powerful AI technologies that other people can use.
Accordingly, Microsoft will soon begin open sourcing its Microsoft Turing models, as well as recipes for training them in Azure Machine Learning, giving developers access to the same family of powerful language models that the company has used to improve language understanding across its products.
It also unveiled a new version of DeepSpeed, an open-source deep learning library for PyTorch that reduces the amount of computing power needed for large, distributed model training. Along with this announcement, Microsoft announced it has added support for distributed training to the ONNX Runtime, an open-source library designed to enable models to be portable across hardware and operating systems.
Learning the nuances of language
Designing AI models that might one day understand the world more like people do starts with language. However, these deep learning models are now far more sophisticated than earlier versions.
Two years ago, the largest models had 1 billion parameters. The Microsoft Turing model for natural language generation now stands as the world’s largest publicly available language AI model with 17 billion parameters.
In what’s known as “self-supervised” learning, these AI models can learn about language by examining billions of pages of publicly available documents on the internet. As the model does this billions of times, it gets very good at perceiving how words relate to each other. This results in a rich understanding of grammar, concepts, contextual relationships and other building blocks of language.
AI at Scale
One advantage to the next generation of large AI models is that they only need to be trained once with massive amounts of data and supercomputing resources. A company can take a “pre-trained” model and simply fine tune for different tasks with much smaller datasets and resources.
The Microsoft Turing model for natural language understanding, for instance, has been used across the company to improve a wide range of productivity offerings over the last few years. It has significantly advanced caption generation and question answering in Bing.
In Office, the same model has fueled advances in the smart find feature enabling easier searches in Word, the Key Insights feature that extracts important sentences to quickly locate key points in Word and in Outlook’s Suggested replies feature that automatically generates possible responses to an email.
Microsoft is also exploring large-scale AI models that can learn in a generalised way across text, images and video. To train its own models, Microsoft had to develop its own suite of techniques and optimization tools, many of which are now available in the DeepSpeed PyTorch library and ONNX Runtime.
The efficiencies that Microsoft researchers and engineers have achieved in this kind of distributed training will make using large-scale AI models much more resource efficient and cost-effective for everyone, Microsoft says.
“By developing this leading-edge infrastructure for training large AI models, we’re making all of Azure better,” Scott said. “We’re building better computers, better distributed systems, better networks, better datacenters. All of this makes the performance and cost and flexibility of the entire Azure cloud better.”