CVPR 2018 will be held in Salt Lake City, USA. After 12 papers were included in ICCV (including 3 oral reports), a number of papers were employed in this year's CVPR 2018 on the Tencent excellent map laboratory, and by this, the database project under the AI Technology Review of the Lei Feng network academic channel"AI impact factor"There is a corresponding display.
The Tencent excellent team will give a detailed introduction to two of these papers and give a brief introduction to other papers.
Deciphering motion blur: towards practical nonspecific scene blurred Technology
Blurred images often bother photographers when they are photographed slowly or quickly. A new algorithm for restoring blurred images has been developed by the researchers at the excellent laboratory.
Before that, image deblurring is always a difficult problem in image processing. The cause of image blurring may be very complicated. For example, camera sloshing, losing focus, shooting objects moving at high speed and so on. The tools in the existing picture editing software are usually unsatisfactory, for example, the "camera jitter reduction" tool in Photoshop CC can only handle the simple camera shift jitter blur. This type of fuzziness is called "uniform fuzziness" in the computer vision industry. Most of the blurred images are not "fuzzy", so the application of existing picture editing software is very limited.
The new algorithm of Tencent excellent lab can deal with blurred images in non-specific scenes. The algorithm is based on a fuzzy model called "dynamic blur". It models the motion of each pixel individually, so it can handle almost all kinds of motion blur. For example, in the image above, the movement of each character is different because of the translation and rotation caused by camera shake. After the new algorithm of Tencent excellent image processing, the picture has been restored to almost completely clear, and even the words in the background are legible.
According to the research fellow of Tencent's excellent lab, the basic technology used in the Tencent map is deep neural network. After training thousands of pairs of Fuzzy / clear image sets, the powerful neural network automatically learns how to clear the blurred image structure.
Although the use of neural networks to blurred pictures is not a new idea, the Tencent UG lab has a unique combination of physical intuition to promote model training. In the papers of the new algorithm of Tencent's excellent lab, its network mimics a mature image restoration strategy known as "coarse to fine". In this strategy, the fuzzy image is first reduced to a variety of sizes, and then the larger images are gradually processed from the smaller and clearer images that are relatively easy to recover. The clear images generated in each step can further guide the restoration of larger images and reduce the difficulty of network training.
AI portrait artist: quick processing of portrait attributes in a clean and elegant manner
It is very difficult to modify the facial attributes in portrait photos (not only beautification). Artists usually need to do a lot of processing on the portrait to make the modified images natural and beautiful. Can AI take over these complex operations?
Researchers from Tencent's excellent lab, led by Professor Jia Jia ya, put forward the latest model of "automatic portrait manipulation". With this model, the user simply provides a high-level description of the desired effect, and the model will automatically present the photos according to the command, for example, to make him young / old.
The main challenge facing this task is that it is impossible to collect samples of input output for training. Therefore, the "generation confrontation" network, which is popular in unsupervised learning, is usually used for this task. However, the approach proposed by the excellent team does not depend on the formation of the confrontation network. It trains neural networks by generating noisy targets. Because of the de-noising effect of the deep convolution network, the output of the network is even better than that of the learned target.
"Generating antagonism networks is a powerful tool, but it is difficult to optimize. We hope to find a simpler way to solve this problem. We hope this work will not only reduce the burden on the artists, but also reduce the burden of the engineers of the training model. "The Tencent researchers said.
Another attractive feature of the model is that it supports local model updates, that is, when switching different operational tasks, only a small portion of the model needs to be replaced. This is very friendly to the system developer. Moreover, from application level, the application can be incrementally updated.
Even if the faces in the photo are not cut and aligned well, the model can also participate in the correct facial area implicitly. In many cases, the user just input the original photo into the model to produce high quality results. Even if the video is input into the model frame by frame, it can also deal with the attributes of the face in the whole video.
In addition to the above two articles, the Tencent's excellent map labs are selected for CVPR2018.
1, Referring Image Segmentation via Recurrent Refinement Networks
It is a challenging problem to divide the specified area of a picture according to the description of natural language. The previous neural network method is segmented by fusion of image and language features, but the multiscale information is ignored, which leads to the poor quality of the segmentation results. In this regard, we propose a model based on cyclic convolution neural network. In each iteration, the characteristics of the underlying convolution neural network are added to make the network capture the information at different scales. We visualized the intermediate results of the model and reached the best level in all relevant open data sets.
2, Weakly Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer
Weak supervised and semi supervised human body parts segmentation via gesture guided knowledge transfer
Human body part analysis, or human semantic part segmentation, is the foundation of many computer vision tasks. In traditional semantic segmentation methods, we need to provide manually tagged tags for end to end training using full convolutional network (FCN). Although the methods used in the past can achieve good results, their performance is highly dependent on the quantity and quality of training data. In this paper, we propose a new method for obtaining training data, which can use the data that is easily obtained from the key points of the human body to generate the parsed data of the body parts. Our main idea is to transfer the analytic result of one person's position to another person with similar posture by using the morphological similarity between human beings. Using the results we generate as additional training data, our semi supervised model is superior to the strongly supervised method of 6 mIOU on the PASCAL-Person-Part data set and achieves the best result of the human part parsing. Our method has good generality. It can be easily extended to other objects or animal parts in analytical tasks, so long as their morphological similarity can be expressed by key points. Our model and source code will be published later.
3. Learning Dual Convolutional Neural Networks for Low-Level Vision
Method of processing low-level vision based on double convolutional neural network
In this paper, a two-layer convolutional neural network is proposed to deal with some low-level visual problems, such as image super-resolution, edge-preserving image filtering, image de-raining, and image de-fog. These low-level visual problems usually involve the estimation of the structure and details of the target result. Inspired by this, the two-level convolutional neural network proposed in this paper contains two branches, where the two branches can end-to-end estimate the structure and details of the target result. Based on the estimated structure and detail information, the target results can be obtained separately from the imaging model of the specific problem. The two-layer convolutional neural network proposed in this paper is a general framework that can use the existing convolutional neural network to deal with related low-level visual problems. A large number of experimental results show that the proposed double-convolutional neural network can be applied to most of the low-level visual problems, and has achieved good results.
4. GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation
GeoNet: Joint depth and plane normal vector estimation through geometric neural networks
In this paper, we propose a geometric neural network that simultaneously predicts the depth and plane normal vector of a picture scene. Our model is based on two different convolutional neural networks and iteratively updates the depth information and the plane normal vector information by modeling the geometric relations, which makes the final prediction results have extremely high consistency and accuracy. We validate our proposed geometric neural network on the NYU data set. The experimental results show that our model can accurately predict the depth and plane normal vectors with consistent geometric relations.
5. Path Aggregation Network for Instance Segmentation
Instance segmentation through path aggregation network
In a neural network, the quality of information delivery is very important. In this paper, we propose a path aggregation neural network designed to improve the quality of information transfer in an instance-based segmentation framework. Specifically, we built a bottom-up pathway to deliver accurate location information stored in the lower neural network layer, shorten the distance between the underlying network and the higher-level network, and enhance the quality of the entire feature hierarchy. We show the adaptive feature pooling, which connects the area features with all the feature levels, so that all useful information can be passed directly to the subsequent area subnetworks. We added a complementary branch to capture the different characteristics of each region and ultimately improved the mask's prediction quality.
These improvements are very easy to implement and add less additional computational effort. These improvements helped us to win the first place in the 2017 COCO instance segmentation competition and second place in the object detection competition. And our method has achieved the best results in the MVD and Cityscapes datasets.
6. FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors
FSRNet: Face-to-End Training Face Super Resolution Network Based on Prior Information
This article was led by Tencent's Youtu Laboratory and Nanjing University of Science and was selected as the Spotlight article. Face super resolution is a particular area of super resolution, and its unique face prior information can be used to better super-resolution face images. This paper proposes a new end-to-end training face super-resolution network, which can improve very low resolution people without face alignment by better utilizing the geometric information such as facial feature heat map and segmentation map. The quality of the face image. Specifically, this paper first constructs a coarse-grained hyperscale network to recover a coarse-resolution high-resolution image. Next, the image is sent to a fine-grained hyper-encoder and a priori information estimation network. The fine-grained hyper-encoder extracts the image features, and the prior network estimates the feature points and segmentation information of the face. The results of the last two branches are merged into a fine-grained hyperscale decoder to reconstruct the final high-resolution image. In order to further generate a more realistic face, this paper proposes a facial super-resolution generation confrontation network and integrates confrontation ideas into hyper-sub-networks. In addition, we introduce two related tasks, face alignment and face segmentation, as new evaluation criteria for face hyper-fractionation. These two criteria overcome the inconsistency of numerical and visual quality in traditional guidelines (such as PSNR/SSIM). A large number of experiments show that the proposed method is significantly superior to the previous hyperscale method in both numerical and visual quality when dealing with very low resolution face images.
7. Generative Adversarial Learning Towards Fast Weakly Supervised Detection
Fast weakly supervised target detection based on generating confrontation learning
The paper proposes a generational confrontation learning algorithm for fast and weakly supervised target detection. In recent years, there has been a lot of work in the field of weak supervision target detection. Without manual labeling of bounding boxes, most of the existing methods are multi-stage flows, including the candidate zone extraction phase. This makes online testing an order of magnitude slower than fast, supervised target detection (such as SSD, YOLO, etc.). The paper is accelerated by a novel generational learning algorithm. In this process, the generator is a single-phase target detector. An agent is introduced to mine high-quality bounding boxes, and discriminators are used to determine the source of bounding boxes. The final algorithm combines structural similarity loss and confrontation loss to train the model. Experimental results show that the algorithm has achieved a significant performance improvement.
8. GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
Image-based automatic description based on grouping with structured relevancy and differential constraints
This paper proposes an image auto-description method (GroupCap) based on the analysis of the semantic association of group images, and models the semantic correlation and differences between images. Specifically, the paper first uses the deep convolutional neural network to extract the semantic features of the image and uses the proposed visual analytic model to build the semantic association structure tree. Then it adopts triple loss and classification loss on the basis of the structural tree and semantic relationship between images ( Dependencies and differences are modeled, and relevance is then used as a constraint to guide deep-cycle neural networks to generate text. The method is novel and effective, and it solves the defect that the current automatic image description method is not accurate and has poor discriminability, and achieves higher performance on a plurality of indicators automatically described by the image.