Hot Chips 31 This week in Silicon Valley, USA, the two largest chip releases are eye-catching. They are Cerebras' largest deep learning chip, WSE, and Xilinx has released the world's largest FPGA. Nervana NNP-T/NNP-I, two AI chips released by Intel, is also receiving much attention. However, AMD did not receive much attention during Hot Chips, perhaps because they used the AI strategy of “wait and see” in the current hot AI.
How do Intel, AMD, and NVIDIA estimate the AI market?
NVIDIA expects its data center and artificial intelligence potential market (TAM) to reach $50 billion by 2023. This includes HPC (High Performance Computing), DLT (Deep Learning Training) and DLI (Deep Learning Reasoning).
Intel estimates that its potential market for DLT and DLI will reach $46 billion by 2020.
AMD has not released any forecasts for potential markets for deep learning as it is more focused on gaining market share from Intel and NVIDIA. Therefore, I have not seen AMD have chips that focus on artificial intelligence.
However, AMD CEO Lisa Su said that we are working hard to become a more important player in the field of artificial intelligence.
Lisa Su: Limitations of CPU
Any discussion of computational performance begins with Moore's Law, but Moore's Law is slowing. Moore's Law states that as chip size shrinks and transistor density increases, computing performance will double every two years.
In an AnandTech Hot Chips 31 report, Lisa Su explained in a keynote speech that AMD improved CPU performance in a variety of ways, including process routing, chip area, TDP (thermal design power), power management, microarchitecture And the compiler.
Advanced technology processes contribute the most, which increases CPU performance by 40%. Increasing the chip size can also lead to an increase in double-digit performance, but this is not cost effective.
AMD increased the EPYC Rome server CPU IPC (instructions per cycle) by 23% and 15% in single-threaded and multi-threaded workloads, respectively, through the microarchitecture. Above the industry average of 5%-8%. All of the above methods double the performance in two and a half years.
Lisa Su: Accelerated calculations required for artificial intelligence
Lisa Su said that on the one hand, Moore's Law is slowing down. On the other hand, the performance of the world's fastest supercomputer doubles every 1.2 years. This means that the solution of the past decade will be ineffective.
The industry's current needs are to optimize the various parts of the system, making it an ideal choice for artificial intelligence workloads. She explained that ASICs and FPGAs have the highest performance per watt and the lowest CPU. A general-purpose GPU is between the CPU and the FPGA in terms of performance per watt.
Lisa Su points out that each artificial intelligence workload has different computing requirements. Interconnect technology is the solution because it interconnects different parts to the same system. She explained this with the following example:
The industry uses traditional methods to improve CPU and GPU performance. Lisa Su emphasized that the industry should improve performance by focusing on interconnects, I/O, memory bandwidth, software efficiency, and hardware and software co-optimization.
AMD's AI strategy
Lisa Su said that AMD has adopted the CPU / GPU / interconnection strategy to tap the opportunities of artificial intelligence and HPC. She said AMD will use all of its technology in the Frontier supercomputer. The company plans to fully optimize its EYPC CPU and Radeon Instinct GPU for supercomputers. It will further enhance system performance through its Infinity Fabric technology and unlock performance through its ROCM (Radeon Open Compute) software tool.
Unlike Intel and NVIDIA, AMD does not have a dedicated artificial intelligence chip or dedicated accelerator. Despite this, Su also pointed out that “we will definitely see AMD as a very important player in artificial intelligence. ”AMD is considering whether to build a dedicated AI chip, and the decision will depend on how artificial intelligence evolves.
Su added that many companies are developing different artificial intelligence accelerators such as ASICs, FPGAs and Tensor accelerators. These chips will shrink to the most sustainable state, and AMD will decide whether to build an accelerator that can be widely used.
At the same time, AMD will work with third-party accelerator manufacturers and connect their chips to their CPU/GPU through their Infinity Fabric interconnects. This strategy is similar to its ray tracing strategy. NVIDIA introduced real-time ray tracing last year, but AMD is not eager to introduce this technology. However, Su said that AMD will introduce ray tracing technology when the ecosystem is perfect and the technology is widely adopted.
Given that AMD is a relatively small player, competing with large players with sufficient resources, the above strategy is economically significant.Sharing shares in established markets can reduce the risk of product failure due to low adoption rates and guarantee a minimum return.
AMD AI strategy differs from Intel and NVIDIA strategies
AMD adopted a wait-and-see attitude before developing AI chips. What they are doing now is to use their existing technology to meet the AI workload.
Intel has developed comprehensive technologies including Xeon CPUs, Optane memory, Altera FPGAs and interconnect technologies. A separate GPU Xe is also being developed. In Hot Chip 31, Intel introduced the Nervana AI chip dedicated to deep learning and deep learning inference. Intel's chips are produced by themselves, and while this gives Intel better control over its technology, it requires a lot of time and resources.
NVIDIA's AI strategy is to provide a generic GPU and CUDA software support for any AI application. It also has NVLink interconnect technology. NVIDIA is working with partners to explore new markets for artificial intelligence. Although this strategy requires a lot of research and has a high risk of failure, these high risks also bring high returns.