Towards Real World Human Parsing: Multiple-Human Parsing in the Wild
J Li , J Zhao , Y Wei , C Lang , Y Li
May 23 2017
https://arxiv.org/abs/1705.07206
This is a paper in May this year, I think it is a bit interesting, so I read it:
1, guide
Before many do Human Parsing’s work, there is only one person considering a picture. This situation is indeed relatively simple, because there are more than one character in the real world often contain more characters, so the author of this article considers the author of this article. This question did Multiple Human Parsing.
The three contribution points of this article:
A. Introduce the analysis of multiple characters, expand the research scope of character analysis, and better match the real world scenarios in various applications.
B. They built a new Benchmark — Multiple Human Parsing (MHP) DataSett
C. They proposed a new MH-PARSER MODEL for Multiple Human Parsing, combined with global information and local information, and then it showed more than the simple “detect-annd-page” method in the past.
2, related work
I wo n’t say much about
A、Human parsing
, introducing various methods of Human Parsing before.
B、Instance-aware object segmentation
Here I mentioned a job: Multi-Task Network Cascades for Differentiating Instances. It should be a job of doing Instance Segmentation. I haven’t read this paper. After that, I will take a look. The division of this job is not particularly fine, that is, it stays in the level of Person, and does not further perform segment on the body part, but then I am going to take a look at it. After all, I do the Instance Segmentation.
3、MHP Dataset
This DataSet contains 4980 IMG, each with at least two Person in the figure. Each prospective person is marked with 18 semantic labels of human experts.
7 Body Parts: “HAIR”, “FACE”, “LEFT Leg”, “Right Leg”, “LEFT ARM”, “Right ARM”, and “Torso Skin”
11 FASHION CATEGORIES: “Hat”, “SUNGLASSES”, “Upper Clothes”, “SKIRT”, “PANTS”, “Dress”, “Belt Shoe”, “Right Shoe”, “BAG”, “BAG”, ” And “Scarf”.
MHP data sets have a total of 14,969 comments. 980 are used as Testing Set, 3,000 for Training Set, and 1,000 as Validation Set.
4、Multiple-Human Parsing Methods
①MH-Parser
Includes 5 parts:
A、Representationlearner
This is the main network, a CNN feature device. Its extracted features are shared by the latter modules. Here is a full convolutional network to maintain Spatial information.
B、Global parser
Get the overall information of the entire image, to do a PARSING for the whole picture
C、Candidate nominator
Including three sub -modulesRegion Proposal Network(RPN), abounding box classifier andabounding box regression, similar to Faster RCNN, detect everyone and get the rectangular box
D、Local parser
Semantic labels semantic marks for each human rectangular box
E、Global-localaggregator
At the same time, input Local Parser and the Global Parser network, for the Semantic PARSING PREDICTINS of the single rectangular box
②Detect-and-parse baseline
Note that the repressntation learner of these two stages is independent of each other without sharing information.