As a computational technique that simulates the processing of information by the human brain, neuromorphic computation is considered to be one of the important directions for implementing general artificial intelligence.
The main difference between it and traditional computing technology is that itGet rid of von · Neumann architecture to separate the shacklesIn pursuit of efficient integration of human brain neurons, the functions of data storage and processing are concentrated on the same computing unit, so that data can be processed with higher energy efficiency, better performance and faster speed.Therefore, it is quite concerned by the field of artificial intelligence.
Among them, one is called “MemristorThe hardware components of the ” are the key to the realization of neuromorphic calculations. Simply put, a memristor combines the functions of memory and resistance.
Recently, Professor Lu Wei of the University of Michigan and his team took an important step and successfully developedThe world's first universal AI chip based on a memristor array. The innovation of this new AI chip is that all the storage computing functions are integrated on the same chip, which truly realizes the integration of calculation and storage, and can be programmed to apply to various artificial intelligence algorithms to further increase the calculation speed. Reduce energy loss. Related research was published on July 15thNature Electronicson. Chip design partners include Professor Zheng Ya Zhang of the University of Michigan and Professor Mike P. Flynn's team.
Lu Wei, Zhang Zhenya, Mike P. Flynn. Source: School official website
In an exclusive interview with DeepTech, Prof. Lu Wei said that at present, edge-based edge computing is an application scenario where the new AI chip is suitable for cutting.In addition, the team has set up a company to promote the commercialization of the next generation of products.
“We published this chipStill used for research and verification concepts, and no deep optimization. We have begun to study the next generation of chips that are more optimized and more functional," said Professor Lu Wei.
In addition to Lu Wei's team, companies investing in neuromorphic computing chips include IBM (TrueNorth Project), Intel (Loihi Project), Qualcomm (Zeroth Project) and other industry giants, as well as Zhicun Technology, Xijing Technology, etc. Start-up companies, last month's Tsinghua & ldquo; heavenly brain computing chip is also an innovation breakthrough in the direction of integration. As more and more innovation breakthroughs are realized, the next generation of computing technology is getting closer and closer to us.
Memristor array chip. Source: Robert Coelius, Michigan Engineering
The calculation trend of the next generation: the calculation of one
The existing von · Neumann calculation system stores data and performs calculations by different units. As the amount of data increases and the complexity of the algorithm increases, the time consumption of data access between the memory and the operator is further Improve the bottleneck of computing performance.
Especially in the operation of the artificial intelligence algorithm, once the data is stored in the hard disk instead of the system memory, the calculation speed will be reduced by a hundred times, and the power consumption will increase by a hundredfold. Even if all data can be stored in memory, the existing central processing chip needs to read data from the memory, but the data back and forth between the memory and the central processor consumes a lot of time and power.
In response to this problem, many chip companies, startups and scientists have invested a lot of time and money to study how to transfer operations from the computer into the memory. This method is also calledAccounting integration. This method can not only improve the calculation speed, but also reduce the calculation power consumption.
Source: University of Groningen
According to Lu Wei, the integrated computing architecture is very attractive for applications that need to process large amounts of data, such as AI. The ideal integrated computing architecture can store the entire AI model on the chip and run directly on the storage unit. This completely discards external memory cells such as DRAM to greatly increase the power consumption ratio and throughput of the chip. The existing and underdeveloped solutions are probably the following:
1. Using existing storage technologies such as SRAM, this solution is mature and adopted by many startups and research institutions. However, due to SRAM density and power consumption limitations, it can only be used on a small "toy model". Large models also need to store parameters on external DRAM, and the power consumption ratio of the entire system will drop rapidly.
2. New non-volatile memory (NVM) such as RRAM, STT-MRAM, this direction is also of interest to many companies. This type of embedded NVM promises to achieve on-chip storage of the entire model, but the limitation is that the operation of the model also needs to pass through the CPU of the central processing unit, and can not completely solve the problem of data transmission such as bus blockage.
3. The true sense of the integrated structure will try to avoid using the CPU, but directly through the embedded NVM. The difficulty lies in how to solve the error problem of the analog signal brought by the operation of the memory unit and how to achieve sufficient precision and efficient analog/digital signal conversion.
“The third option is theoretically the most efficient, but the current progress is still behind the plan 1,2& rdquo;, Professor Lu Wei said.
In this study, the team successfully verified the small-scale, real-life integration architecture (Scheme 3), implemented various functions including inference and online learning, and carefully analyzed the simulation. Signal error issues and the effects of analog/digital signal conversion circuits.
In addition, existing integrated memory chips are often designed for a specific artificial intelligence problem, or require additional processors to drive, which greatly limits the promotion and application of integrated memory chips. The chip developed by the Lu Wei team can realize the integration of various algorithms for additional algorithms without additional assistance.
The first memristor based AI chip
Memristors and other non-volatile memory devices are a good choice for integration.
In artificial intelligence and deep learning algorithms,The core operation is mainly a large number of vector-matrix multiplication(Vector-Matrix Multiplication, VMM). Since the memristor array-based chip uses analog circuits instead of digital circuits, it is very efficient for VMM calculations and has shown potential in the field of artificial intelligence computing in a number of studies.
The concept of memristor was first proposed by UC Berkeley professor Leon O. Chua (Cai Shaoqi) in 1971, and in 2008 the first solid state memristor was developed by Hewlett-Packard. The memristor is a passive electronic component with two ports, similar to a well-known resistive component. The difference is that the resistance value can be changed by the current flowing through this resistor, which means that the resistor has the ability to memorize current and charge. The circuit structure of the memristor array is a matrix-like vertical and horizontal array. In the VMM operation, the chip uses the resistance value in the vertical and horizontal array to store the matrix data, and controls the vector multiplied by the matrix by the input voltage value, so that the vector-matrix multiplication result can be obtained from the output voltage.
In addition to resistors, capacitors, and inductors, there should be another component that represents charge and flux, the memristor (source: Wikipedia).
This newly developed chip integrates 5832 memristor components with an OpenRISC processor, plus 486 special digital-to-analog circuit converters, 162 special analog-to-digital circuit converters, and two The mixed signal interface is used to interface the memristor analog circuit with the central processor circuit.
Full power work, chipOnly 300 mW of power consumption, can achieve 188 billion operations per watt per second.Although the computing speed is slightly inferior to NVIDIA's latest artificial intelligence chip (up to 9.09 teraflops per watt per second), the chip has significant advantages in power consumption and data access.
And in the verification of versatility,The team used a memristor array chip to implement three artificial intelligence algorithms.. The first is the famous machine learning algorithm called “perceptron”, which is one of the most common machine learning algorithms used for information classification. The team successfully implemented the single-layer perceptron operation with this chip and used it to identify noisy pictures of Greek letters.
Another more sophisticated algorithm implemented by this chip is the “sparse coding” algorithm. This algorithm optimizes the neural network by comparing neurons, eliminates invalid neurons, finds the optimal connection mode of neurons, and then finds the optimal neural network for the target, which can be used for feature extraction and data compression. And data classification and other work.
Finally, this chip implements an unsupervised learning algorithm for a two-layer neural network to identify and determine breast tumor images. The first layer in the neural network uses the principal component analysis method from the features in the primary recognition image, and the second layer uses the perceptron to further determine whether the tumor in the image is malignant. This algorithm can run up to 94.6% accuracy on this chip, and the result is very close to the 96.8% accuracy achieved on traditional chips. This subtle gap is mainly due to the charge uncertainty of the memristor element at the classification boundary.
Memristor array chip. Source: Robert Coelius, Michigan Engineering
Of course, this memristor chip still has a lot of optimization and room for improvement.According to IEEE Spectrum, they used 180-nanometer transistors 40 years ago in their chips, and with 2008 40-nm transistor technology, they can continue to reduce power consumption to 42 milliwatts and increase computational efficiency to The second is 1.37 teraflops per watt. In contrast, NVIDIA's latest artificial intelligence chips use a more advanced transistor manufacturing process in 2014.
Lu Wei also said that he has begun to study the design of more optimized and more functional next-generation chips, using faster and more advanced transistors, and more memristor arrays, allowing more complex nerves to be run through multiple arrays. Network algorithm.Now, the team has set up a startup called “MemryX”, which will further commercialize the chip.
“MemryX's goal is to provide a mature, commercial, cost-effective architecture solution.We have made very substantial progress now," he said.
It is worth mentioning that Crossbar, another startup founded by Professor Lu Wei, also focuses on the research of memristors and the development of artificial intelligence chips.In the memory industry, Crossbar has introduced the Crossbar ReRAM solution to the market and has become one of the leaders in new storage technologies.SMIC announced in 2016 that it has partnered with Crossbar to integrate its ReRAM technology into a variety of devices; in 2018, Crossbar also signed a cooperation agreement with aerospace chip manufacturer Microsemi and introduced face recognition and license plates. Identify the chip prototype.