MLPerf: Huashan's sword in the field of machine learning
MLPerf was created in May 2018 by Google, Baidu,Intel,AMDA benchmark test tool jointly released by companies and academic institutions such as Harvard University and Stanford University. It is used to measure the execution speed of machine learning software and hardware. It is highly recommended by Wu Enda and Google Machine Learning Leader Jeff Dean. .
In response to the release of MLPerf, Wu Enda stated:
AI is bringing changes to every industry, but in order to fully realize the true potential of this technology, we still need faster hardware and software... We certainly want a more powerful resource platform, and the standardization process of the benchmark will help AI Technology developers create such products to help adopters more intelligently choose the AI options that suit their needs.
Jeff Dean also said on Twitter that Google is pleased to work with universities and businesses to become one of the organizations dedicated to using MLPerf as a universal standard for measuring machine learning performance.
The main objectives of the MLPerf project include:
Accelerate the development of machine learning through fair and practical metrics. Conduct a fair comparison of competing systems while encouraging innovation to improve industry-leading machine learning technologies. Keep the cost of the benchmark reasonable and allow everyone to participate. Serving the business and research communities. Provide repeatable and reliable test results.
In specific test projects, MLPerf covers the four major areas of vision, language, business and general, including seven benchmarks. The metric for each MLPerf training benchmark is the overall time to train a model to achieve a particular performance on a particular data set. As we all know, the training time of machine learning tasks is very different. Therefore, the final training result of MLPerf is averaged by the specified number of benchmark times, which will remove the lowest and highest numbers.
The results of MLPerf are categorized according to the zone and the given product or platform. There are currently two zones, the Closed Division and the Open Division. The closed zone will specify the model to use and limit the value of the hyperparameters such as batch size or learning rate, which is very fair for comparing hardware and software systems.
Yingwei reached the biggest winner in the first round
On December 12, 2018, researchers and engineers who supported MLPerf announced the results of the first round of competition, which measured the training time of various machine learning tasks on mainstream machine learning hardware platforms, including Google's TPU.IntelCPU and NVIDIA GPU. The benchmarks are as follows:
Through this competition, MLPerf produced the closed version V0.5 version, the results are as follows:
From the results, NVIDIA achieved the best performance in its six MLPerf benchmark results, including image classification, target instance segmentation, target detection, acyclic translation, loop translation and recommendation system - thus becoming the biggest winner .
With Cloud TPU v3 Pod, Google has three wins in five games
On July 10, 2019, the results of the second round of MLPerf were announced. The test criteria are as follows:
The V0.6 version of the closed area based on the results of this round of competition is as follows:
As you can see, based on the results of the MLPerf Closed Zone 0.6 version, based on Transformer andSSDIn the model's benchmark project, the Google Cloud TPU outperformed NVIDIA's pre-built GPUs by over 84%. In addition, based on the ResNet-50 model, the Google Cloud TPU is also slightly better than the NVIDIA preset GPU.
In this competition, the one that helped Google win was the Cloud TPU v3 Pod.
The Cloud TPU v3 Pod is Google’s third-generation scalable cloud supercomputer with a built-in TPU processor built by Google. In May 2019, Google announced its beta at the I/O Developers Conference and made a public preview.
Each Cloud TPU can contain up to 1024 individual TPU chips connected via a two-dimensional ring mesh network, which the TPU software stack uses to program multiple racks as a single machine through various advanced APIs; You can take advantage of a small portion of the Cloud TPU Pod called "slices."
According to Google, the latest generation of Cloud TPU v3 Pods is designed with liquid-cooled design for optimal performance; each provides more than 100 petaFLOP of computing power; Google also claims that Cloud TPU v3 Pod is in terms of raw math per second. It is comparable to the top five supercomputers in the world, although its numerical accuracy is low.
With this opportunity in the second result of MLPerf, Google also did not forget to introduce the latest development of Cloud TPU v3 Pod on the official website. For example, Recursion Pharmaceuticals is a company that uses computer vision technology to process cell images and analyze cell characteristics to assess the post-drug response of disease cells; in the past, the company used the local GPU to train the model for 24 hours, but with Cloud TPU Pod It only takes 15 minutes to complete.
As a typical technology group, Google is so committed to the advancement of Cloud TPU, of course, and hopes that more developers will participate in it - after all, cloud computing is one of the most important businesses of Google.