Main content
human posture estimation: refers to the process of restoring the key point of the human body in the given picture or video. Generally speaking, the human body will be detected and located first, and then key points analysis of the human body will be detected. People usually choose 18 key points, corresponding to the main joints and parts of the human body.
If part of the person is blocked or there are too many people in the picture, even if they are covered in each other, can the machine be recognized? The introduction of human structured information is a good choice. Human bones have a certain limit. If the distance between key points is too far, you must consider whether there is another person or identifying errors.
If the video data is entered, we can also use the continuity of the body’s body gesture to improve the results of the gesture of the body.
can be applied to determine whether the people in the field of security are excessive, and timely alarm; applied in new retail judgment people’s purchase and other behaviors; The identification, positioning the individual in the tracking space; action capture, such as dancing games or as human -machine interaction methods, controlling home appliances.
Enter an RGB image, hoping to restore the three -dimensional attitude of the person in the image.
1. Three -dimensional key points, connected to a three -dimensional skeleton, can be visualized, and the key point is estimated to be in the space;
2. Parameter -based human geometric models, commonly used SMPL models, usually control its deformation by a set of attitude, requires estimated parameters of the posture, and estimated the parameters of the shape.
- Optimization Optimize the parameters of the three -dimensional human body, so that the projection of the model in the image plane is aligned with the characteristics of the image, such as the outline of the key point of the two -dimensional. Limitation: It is necessary to have a relatively good initialization. The process of optimization is relatively slow, and it is easier to fall into partial optimal.
- Regression using a neural net
Based on deep learning, the parameters of the return attitude of the input image are relatively fast, using end -to -end learning.The two solutions above
can also be combined, using the network to predict a relatively good initialization, and further optimize the posture of the image with the characteristics of the image.
It is difficult to mark the 3D posture in the image.
1. Use 2D information to supervise, Model Fitting in the loop
2. Use Unpaired Data
3. Use multi -view
1. Consider frame information when feature extraction, use lstm
2. Extend the 3D posture to the device of the 3D movement
The idea of solving this problem:
1.TOP-DOWN FRAMEWORK: First of all in the image detection person, to estimate the position of the root point for everyone, and the 3D Pose of the root node. The single -person gesture is estimated to add an estimate of everyone.
2. Bottom-up framework
First use the network to return to some intermediate representations, the key points of 2D, and the depth diagram of the root node, and then combine multiplayer 3D skeleton.
2. Use the information of the scene to speculate the body’s posture and reduce the ambiguity in the three -dimensional gesture estimation.
3. Sports capture+sports simulation
Paper list:
https://github.com/zju3dv/Monocular_3D_human