The Imagination, that once dominated the mobile GPU IP market has changed, with 36 percent of the mobile GPU IP market and 43 percent of the car market. Imagination recent release of a series of new products is not only a demonstration of its strength, but also enough to increase the attention of peers to the old opponent.
On November 13, imagine released img series4, the latest third generation neural network accelerator (NNA) product, which took two years to develop. Its new multi-core architecture can provide 600 tops (trillion operations per second) or even higher ultra-high performance, mainly for advanced driving assistance system (ADAS) and automatic driving applications.
What is the impact of low-power product Imagination, the ultimate AI accelerator, on the NVIDIA of the auto driving chip market?
The ultimate AI accelerator built in two years
Imagine launched the first generation of neural network accelerator (NNA) powervr 2nx in the hot AI year of 2017, with single core performance from 1tops to 4.1tops. Then, in 2018, powervr 3nx was released, with single core performance ranging from 0.6tops to 10tops, and multi-core product performance from 20tops to 160tops.
At the same time of performance enhancement, the NNA of imaging is mainly aimed at the mobile device and automobile market of 2nx to the intelligent camera monitoring, consumer electronics (especially digital TV) and low-power IOT intelligent devices.
Two years later, imagination launched its third-generation NNA product 4nx. The single core performance of 4nx series is further improved, and each single core provides 12.5tops performance with less than 1 watt power consumption. Compared with the previous two generations of NNA, the new generation of products emphasizes the new multi-core architecture. This new multi-core architecture supports flexible allocation and synchronization of workload among multiple cores, so as to achieve higher performance.
Gilberto Rodriguez, director of product management at imagine technologies, introduced:
It is reported that in AI reasoning, series4 NNA has more than 20 times faster performance than embedded GPU and 1000 times faster than embedded CPU.
Gilberto Rodriguez regard to why such a high-performance AI accelerator was introduced, Gilberto Rodriguez said,
How to balance the high performance of 600tops with low power consumption?
It should be noted that,4NX series of 8 kernel clusters to achieve 100 TOPS of performance, more than 30 TOPS/Watt of performance power ratio, and more than 12 TOPS/mm^2 performance density is to be implemented at 5 nodes.
Gilberto Rodriguez also mentioned that if you want to use multiple clusters to achieve higher computing power, imagination can provide multi cluster collaboration mechanism, but it also needs some design in the application layer.
The scalability of multi-core flexible architecture enables 4nx to achieve high performance, but for high-performance chips, power control is also very important, especially for AI chips. AI chips need to process a large number of data, and the power consumption of data handling is far greater than that of data processing. Therefore, high-performance AI chips must find ways to minimize data handling, reduce latency and save bandwidth.
In order to reduce the latency, imagine uses a multi-core cluster composed of two cores, four cores, six cores or eight cores. All cores can cooperate with each other to process a task in parallel, so as to reduce the processing delay and shorten the response time.Of course, cluster and multi-core can not only perform a batch task together, but also run different networks separately, that is, each kernel can run independently.
The increase of the number of cores can improve the performance and reduce the delay
Different nuclei operate independently
4NX's bigger bright spot is its bandwidth-saving Tensor Tiling
Specifically, the multilayer of neural network runs in the hardware pipeline of accelerator in the form of fusion kernel, and the feature map between fusion cores needs to be exchanged through external storage. Tiling technology is to make full use of tightly coupled SRAM to fuse more layers. After more layers are fused, it reduces the need to exchange characteristic graphs through external storage, so as to improve efficiency and save bandwidth.
Batch processing and splitting in tensor tiling technology should also be described. Batch processing is to allocate a large number of small network tasks suitable for batch processing to each single NNA core that works independently, which can improve the parallel processing ability. Splitting means that the task is split in multiple dimensions, and all the single core of NNA execute one inference task together, which can reduce the inference delay of network. Under ideal conditions, the throughput of collaborative parallel processing is the same as that of independent concurrent processing, which is very suitable for the network with large network layer.
Of course, tensor tiling is split through the compiler provided by imagination, which does not need to be completed manually by developers. Moreover, the AI tasks can be better scheduled and allocated by using the performance analysis tools of NNA.
Can Tensor Tiling save bandwidth while reducing data migration? Gilberto Rodriguez said,
In terms of the tool chain at the upper level of hardware, the workflow composed of offline and online tools of imagination can make developers realize deployment faster.
NVIDIA will meet new competitors in the field of autonomous driving?
NVIDIA launched the on-board computing platform in 2015, and has continued to iterate since then. At present, NVIDIA has been in an advantageous position in the autopilot chip market. However, NVIDIA, which excels at desktop GPUs, can provide high performance, but the power consumption may not be friendly to battery powered electric vehicles. This is also an opportunity for imagination, which has advantages in mobile terminals with strict power consumption requirements.
Unlike NVIDIA, Imagination is a IP provider and does not provide chips directly. Therefore,Imagine can work with leading automotive industry disruptors, first tier suppliers, OEMs and SOC manufacturers to launch competitive products.In order to help partners better enter this market and launch vehicle size products more quickly, the NX4 also includes IP level security features, and the design process conforms to ISO 26262 standard.ISO 26262 is an industry safety standard designed to address the risks of automotive electronic products.
The new 4-series NNA can perform neural network reasoning safely without affecting the performance. Hardware security mechanism can protect the compiled network, network execution and data processing pipeline.
Andrew grant has revealed that it has started to provide licenses and will be fully available on the market by December 2020. More than one customer has been authorized.
This means that the autopilot chip market will usher in more competitive products. Lei Feng believes that imagination's stronger GPU and NNA product portfolio will help more companies that want to enter this market to launch more competitive products. Last month, imagine released the latest generation of img B Series High-Performance GPU IP. This multi-core architecture GPU IP has four series of cores with 33 configurations.
More versatile GPU and more dedicated AI accelerators clearly offer more options for high-performance computing. Interestingly, NVIDIA also now have a combination of strong GPU and AI acceleration Tensor Core.
ABI Research predicts that by 2027, the demand for ADAS will increase by two times. However, the auto industry has already turned its attention to further automatic driving cars and autopilot taxis. The combination of high performance, low delay and high energy efficiency will be the key to the evolution of L2 and L3 ADAS to L4 and L5 full automatic driving.
Under the huge market opportunity, how will the two companies with similar chip product advantages compete?
- THE END -
Original link:Lei Feng net