Artificial intelligence (AI) technology is in a golden age of development. AI-related research and applications have put forward high requirements and challenges for computing power. Therefore, AI chip technology has become a research hotspot in recent years. Many companies and scientific research institutions have Related products have been released, and the AI chip has also advanced from technical verification to the stage of landing in industrial scenarios. This article will introduce and analyze the main landing scenarios of AI chips and the key technologies required.
Introduction
Deep learning has brought breakthroughs in computer vision, natural language processing, personalized recommendation, speech recognition and other fields. In the direction of computer vision, the accuracy of image classification and object detection has reached or even exceeded human levels. Deep learning neural networks require a large number of matrix multiplication and addition operations, which have high requirements for hardware computing power. CPUs and traditional computing architectures cannot meet the requirements for parallel computing capabilities, and special customized artificial intelligence (artifi‐ cial intelligence, AI )chip. Since 2016, large-scale artificial intelligence companies such as Nvidia, Google, Amazon, Tesla, Huawei, Bitmain, and Cambrian have emerged. Some of these companies’ AI chips have been iterated for more than three generations. And it has been promoted in the market for a period of time, and the pilot project has been completed.
AI chip function and technical architecture
Deep learning consists of two tasks, training and inference, so the main functions of the AI chip are training and inference. There are certain differences in the chip requirements for training and reasoning. In general, chips for training and reasoning are designed separately.
1) Training The training task is to learn a large amount of data on the platform and form a neural network model with specific functions. The purpose of the training chip is to allow algorithm researchers to quickly verify the algorithm scheme and observe the test results. There are requirements for high computing power, high capacity, high transmission rate and versatility for AI chips.
2) Reasoning The reasoning task refers to calculating the results for the input data based on the trained model. It is necessary to comprehensively consider factors such as computing power, power consumption, industrial grade, price and cost of the inference chip according to the actual application scenario. The main technical specifications of Huawei's training and inference AI chips are shown in Table 1.
Table 1 Huawei Ascend AI chip specificationsAI Chip | Ascend910 | Ascend910 |
---|
Function | Training | Inference |
Process/nm | 7 | 12 |
Computing power | INT8 640TOPS FP16 320TFLOPS | INT8 22TOPS FP16 11TFLOPS |
Power Consumption /W | 310 | 8 |
Internal storage | HBM2E | 2 * LPDDR4x |
AI Chip Technology Architecture
At present, it is generally believed that AI chips are chips specially designed for AI algorithm acceleration, so the traditional CPU architecture does not belong to the category of AI chips. From the perspective of technical architecture, AI chips are divided into GPU, FPGA, ASIC, brain-like chips, etc., as shown in Table 2.
Table 2 Technical architectureTechnology Architecture | Advantage | Shortcoming | Representative products |
---|
Graphics Processing Unit(GPU) | High programming flexibility, compared with CPU, it has higher parallel computing capability. It has the most mature software ecosystem. | Price and power consumption are prohibitive relative to FPGAs and ASICs. | Training: Nvidia A100, V100. Reasoning: Nvidia T4, Xavier NX |
Field Programmable Gate Array(FPGA) | Semi-custom, the hardware layer of the chip can be programmed and configured. Lower power consumption than GPU. | The hardware programming language is difficult to master, and there is room for further compression of power consumption and cost. | Xilinx |
Application-specific integrated circuit (ASIC) | Tailored for specialized tasks, low cost, low power, high performance. | Chip versatility is poor, programmable architecture design is difficult, and research and development investment is large. | Training: Huawei Ascend 910 Reasoning: Huawei Ascend 310, Bitmain BM1684, Cambrian MLU270, etc. |
Brain-like chip | Break through the von Neumann architecture bottleneck, use the chip to simulate the structure of the brain neural network to achieve optimal performance and power consumption | immature, still in the laboratory stage | IBM TrueNorth, Stanford Nurogrid |
AI Chip Application and Landing Scenarios
According to the location of deployment, AI chips can be divided into three categories: cloud, edge and end-side.
Cloud-side AI chips and hardware
The cloud side mainly refers to the server group deployed in the data center. AI hardware is plugged into the server in the form of a PCIe accelerator card to provide computing power for AI computing. Usually, one server can be plugged with multiple accelerator cards. Accelerator cards for cloud training, such as NVIDIA A100, consume more than 250 W, while inference cards generally limit power consumption to the peak value of 75 W powered by PCIe, without additional power supply. If the power consumption of a single chip is ideally controlled, multiple chips can be placed in one card. For example, Huawei's Atlas300 inference card (as shown in Figure 1) has 4 Ascend 310 chips (single chip power consumption 8 W), and Bitmain's SC5+ reasoning card has 3 BM1684 chips (single chip power consumption). consumes 16W).
Figure 1 Huawei Atlas300 Detail image
The most widely used scenario for chips on the cloud side is in the data centers of major Internet companies, where they are used for video content review, personalized recommendation, voice recognition and other services. In addition, the government's smart city and other projects will build AI computing centers as computing power support to process massive video data structuring and other services.
Edge-side AI chips and hardware
Compared with cloud computing, edge computing has obvious advantages in terms of delay, reliability, cost, and convenient deployment. The AI hardware used in edge computing is mainly in the form of boxes and modules. The AI chip works in the system-on-chip (SoC) mode. In addition to AI computing, it can also implement business applications on the CPU of the chip. Because it needs to process multiple video streams, the edge-side AI chip will be equipped with video codec and image processing hardware acceleration unit. Take Bitmain’s edge computing box SE5 as an example, as shown in Figure 2, it can support more than 30 channels of 1080P high-definition video The hardware decoding of 17.6 Tops is enough to support 16 channels of face recognition services.
Figure 2 Bitmain SE5 Detail image
Edge computing modules are oriented to scenarios that require hardware customization and development, such as AI computing modules in drones and robots. The module design needs to have rich interfaces, small size, low power consumption, wide temperature operation, and high integration. degree and other characteristics.
With the empowerment of AI applications in all walks of life, the landing scenarios of edge computing are also increasing. Summarizing some common functional requirements at present, many edge AI hardware products also integrate the algorithms and applications of these functions, so that they can be used as a complete solutions are provided to users.
1) Construction site and factory safety production: including safety helmet wearing monitoring, overalls wearing monitoring, open smoke and open flame monitoring (as shown in Figure 3) and other functions.
Figure 3 Fire detection Image
2) Industrial quality inspection: production defect and blemish detection, OCR detection, dimension measurement, etc.
3) Smart gas station: Use technologies such as face recognition, vehicle and license plate recognition to realize personnel departure detection, parking space occupancy detection, entrance congestion detection, mobile phone and smoking detection, etc.
4) Bright kitchen and bright stove: chef hat wearing monitoring, chef uniform wearing monitoring, mask recognition, etc.
5) Power inspection: knife switch status identification, instrument intelligent analysis, operation specification monitoring, etc.
On-device AI chips and hardware
AI chips used in terminal equipment such as mobile phones, cameras, and automobiles. On-device AI chips have very strict requirements on power consumption, usually no more than 5 W. Compared with the cloud and edge, the computing power requirements for chips are lower, generally within 10 Tops. Terminal chips for security applications need to support high-definition video encoding and image processing (ISP) functions. Markets such as wearable devices and smart speakers also have a growing demand for end-side AI chips.
AI chip key technology
Design chip specifications for application scenarios
From the previous analysis, it can be seen that different application scenarios have very different requirements on the chip's computing power, power consumption, cost, video image processing capability, and operating temperature. When initially designing chip specifications, it is necessary to clarify the application scenarios and specific requirements of the chip design for the chip design, which can make the selection and matching of the chip design process technology, MAC array, memory, IP core, and interface more accurate and accurate. Reasonable.
Toolchain usability and compiler optimization
One of the main reasons why NVIDIA GPU occupies a leading market share is that it has a mature software ecosystem that supports all mainstream deep learning frameworks (PyTorch\TensorFlow\PaddlePaddle\MxNet). It is very important for AI chip companies to develop a set of mature and easy-to-use tool chain software, in terms of deep learning framework support, operator completeness, custom operator support, performance analysis, error message prompts and debugging, etc. The better it is, the more it helps the product to be recognized and promoted by the market.
When evaluating an AI chip, you cannot simply look at the peak computing power Tops in the chip specification. The recommended evaluation method is to use the classic network model (Resnet50, Mobilenet v2 is the most commonly used) to test on the chip. The main test indicators include: Latency (ms), throughput (image/s), computing power utilization (image/s/T), computing power consumption ratio (image/s/w), to evaluate the real performance of the chip. Among them, the utilization rate of computing power examines the optimization ability of the compiler. The higher the utilization rate of computing power, the better the optimization of the compiler. The compiler mainly uses technologies such as graph optimization and chip architecture-related optimization to drive the efficient execution of the model on the chip.
Low-precision quantization technology
Model low-precision quantization can effectively reduce the size of the model and accelerate the inference speed of the model. It has been widely studied and applied in academia and industry. Currently, two calculation precisions, float16 and int8, are mainly used. The int8 precision requires quantization technology. In order to make the quantization model Compared with the floating-point model, the loss of precision is as small as possible, and developers need to do a lot of work on quantization methods and implementation methods. At present, the accuracy has been improved to a certain extent through simulation training quantization, asymmetric quantization, per-channel quantization, mixed precision computing and other technologies. When selecting a chip, the user must also verify the accuracy performance of the quantified model on the chip.
Conclusion
Through continuous development in recent years, AI chip technology has entered the stage of industrial implementation. It has become a trend to design chips according to application scenarios. Users should also choose chips that suit their product scenario needs when selecting models. For chip computing power, we should not only look at the "peak computing power" advertised by the manufacturer, but actually test it. This article researches and analyzes the landing scenarios and key technologies of AI chips, hoping to help readers gain a deeper understanding of the design and development process of AI chips. The development of AI chips is changing with each passing day. Google has even started to let AI design its own chips. Some companies have also begun to study RISC-V-based chip solutions to meet the needs of the AI era.
Nantian Electronics a professional distributor of electronic components, providing a wide range of electronic products, saving you a lot of time, effort and cost through our meticulous order preparation and fast delivery service.
Share this post