Thursday, May 18, 2017
Each of these new TPU devices delivers up to 180 teraflops of floating-point performance. As powerful as these TPUs are on their own, though, we designed them to work even better together. Each TPU includes a custom high-speed network that allows us to build machine learning supercomputers we call “TPU pods.” A TPU pod contains 64 second-generation TPUs and provides up to 11.5 petaflops to accelerate the training of a single large machine learning model. That’s a lot of computation!
Using these TPU pods, we've already seen dramatic improvements in training times. One of our new large-scale translation models used to take a full day to train on 32 of the best commercially-available GPUs—now it trains to the same accuracy in an afternoon using just one eighth of a TPU pod.
We’re bringing our new TPUs to Google Compute Engine as Cloud TPUs, where you can connect them to virtual machines of all shapes and sizes and mix and match them with other types of hardware, including Skylake CPUs and NVIDIA GPUs. You can program these TPUs with TensorFlow, the most popular open-source machine learning framework on GitHub, and we’re introducing high-level APIs, which will make it easier to train machine learning models on CPUs, GPUs or Cloud TPUs with only minimal code changes.
With Cloud TPUs, you have the opportunity to integrate state-of-the-art ML accelerators directly into your production infrastructure and benefit from on-demand, accelerated computing power without any up-front capital expenses. Since fast ML accelerators place extraordinary demands on surrounding storage systems and networks, we’re making optimizations throughout our Cloud infrastructure to help ensure that you can train powerful ML models quickly using real production data.
Our goal is to help you build the best possible machine learning systems from top to bottom. While Cloud TPUs will benefit many ML applications, we remain committed to offering a wide range of hardware on Google Cloud so you can choose the accelerators that best fit your particular use case at any given time. For example, Shazam recently announced that they successfully migrated major portions of their music recognition workloads to NVIDIA GPUs on Google Cloud and saved money while gaining flexibility.