Xin Zhiyuan reportSource: arxiv
Editor: Xiao Qin and Peng Fei
[Introduction to Xin Zhiyuan]Google recently unveiled a submillisecond face detection algorithm, BlazeFace, a lightweight face detector tailored to mobile GPU reasoning that can run at a speed of 200 FPS and has excellent performance!
In recent years, various architecture improvements of deep neural networks have made real-time target detection possible. Laboratories can develop all kinds of algorithms at all costs in order to achieve the accuracy of approaching the limit. In practical applications, response speed, energy consumption and accuracy are all important. This requires that the complexity of the algorithm be low and suitable for hardware acceleration.
In mobile applications, real-time target detection is often the first step in the video processing process, followed by various specific tasks, such as segmentation, tracking or geometric reasoning.
Therefore, the algorithm of running object detection model reasoning should be as fast as possible, and it is better to have higher performance than the standard real-time benchmark.
Google just uploaded an arXiv paperBlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUsIt's launched.BlazeFace algorithm, a lightweight face detector tailored for mobile GPU reasoning, has excellent performance!
How remarkable is it? Google tested its flagship device and found that BlazeFace could.Run at 200 to 1000 FPS. 。
This super-real-time performance enables it to be applied to any facial area that requires accurate input as a specific model in augmented reality applications, such as 2D/3D facial key points or geometric estimation, facial features or expression classification, and facial region segmentation.
Google has applied the algorithm to industry.
First, two major algorithm innovations, all for fast and good
BlazeFace includes a lightweight feature extraction network inspired by MobileNet V1/V2, but different. A modified SSD target detection algorithm is also adopted to make it more GPU-friendly. Then an improved tie resolution strategy is used to replace non-maximum suppression.
BlazeFace can be used to detect one or more faces in an image captured by a smartphone front-end camera. Returning is a boundary box and six key points of each face (left eye, right eye, nose tip, mouth, lower left eye corner and lower right eye corner from the observer's point of view).
Algorithmic innovations include:
1. Innovations related to reasoning speed:
2. Innovations related to predictive performance:
BlazeBlock (left) and double BlazeBlock
BlazeFace's model architecture, as shown in the figure above, considers the following four factors in its design:
The size of the receptive field:
Although most modern convolution neural network architecture, including MobileNet, tends to use 3 in model diagrams
This study found that the cost of increasing the kernel size of the depth part does not increase much. Therefore, we used 5 in the model architecture
The low overhead of deep convolution also allows us to introduce another layer between the two convolutions, thus further speeding up the required receptive field. This forms a double BlazeBlock, as shown on the right side of the figure above.
In the experiment, we focus on the feature extractor of the front camera model. It must consider a smaller range of objects, so it has lower computing requirements. The extractor uses 128
Improved Anchor mechanism:
An object detection model similar to SSD relies on predefined fixed-size basic boundary boxes called terms proposed in priors, or Faster-R-CNN
We will 8
Pipeline example. Red: Output of BlazeFace. Green: Task-specific model output.
Because our feature extractor doesn't reduce the resolution to 8.
To minimize this problem, we use a hybrid strategy instead of the suppression algorithm, which estimates the regression parameters of a boundary box as the weighted average between overlapping predictions. In fact, it will not bring additional cost to the original NMS algorithm. For our face detection task, this adjustment improves the accuracy by 10%.
2. Designed for GPU, Accuracy Beyond Mobile NetV2
Super real-time performance. Unlocking requires a facial area as input
Design for Mobile GPU
Fast Reasoning on GPU