Recently, the papers published by Wang Xiaolong and others of Carnegie Mellon (CMU) “A-Fast-RCNN: Hard Positive Generation Via Adversary for Object Detection” has attracted many people’s attention. The study will be applied to the problem of image recognition, and it has achieved good results by training and detecting the network by combating the network to generate blocking and deformation pictures. The paper has been accepted by the CVPR2017 conference.
Thesis link: http://www.cs.cmu.edu/~xiaolonw/papers/cvpr2017_adversarial_det.pdf
Thesis: A-Fast-RCNN: Hard Positive Generation Via Adversary for Object Detection
How to determine the image of an object detector that can cope with images that are blocked, different angles or deformation? Our current solution is to use data -driven strategies to collect a huge dataset -covering the appearance of objects under all conditions, and hope that the classifier can learn to identify them as the same object through model training. But can the dataset really cover all the situation? We believe that the characteristics such as classification, cover and deformation also conform to the long -tail theory. Some shielding and deformation are very rare and almost never happen, and we hope that the models we can train can cope with all situations. In this paper, we proposed a new solution. We have proposed a confrontation network that can generate ourselves and deformation examples. The goal of confrontation is to generate an example that is difficult to recognize by an object detector. In our architecture, the original identifier learns with its opponents. Experiments have proved that compared with Fast-RCNN, our MAP on VOC07 rose to 2.3%, and the MAP increased in VOC2012 object recognition challenges was 2.6%. We also published the code of this study at the same time.
Figure 1: In the paper, we propose examples of using confrontation networks to generate cover and deformation, so that it is difficult for object detectors to classify. As the performance of the detector gradually improves, the quality of the picture generated by the network is also increasing. Through this confrontation strategy, the accuracy of neural network recognition objects has been further improved.
Figure 2: The ASDN network architecture of this method and a schematic diagram of how to combine with Fast RCNN. Our ASDN network uses an input picture to add the patch obtained in the ROI pooling layer. The ASDN network predicts the blocking/high-light mask, and then uses it to discard the characteristic value and pass it to the Fast-RCNN classification tower.
Figure 3: (A) Model Pre -training -Find the most difficult obstacle for training ASDN networks. (B) The masked mask generated by the ASDN network, the black area is blocked when passing through the FRCN pipe.
Figure 4: ASDN and ASTN network combination architecture. First create a mask, and then rotate the path to produce examples used for training.
Form 1: The average accuracy of VOC identification test, FRCN refers to the FRCN score using our training method.