Training on Human3.6M dataset #2

hliuav · 2018-08-10T07:31:53Z

Can I use this code to train on human3.6M dataset(16 landmarks) by just simply replace the dataset and partition.txt since the results I get look not as good as those in the paper

YutingZhang · 2018-08-10T07:39:21Z

For Human3.6M, we also used optical flow as self-supervision (see Appendix C). The result may not be as good as those in the paper if the optical flow is not used.
The implementation of the optical-flow based loss is already implemented at

lmdis-rep/net_modules/auto_struct/keypoint_encoder.py

Lines 426 to 506 in d2292d9

    
           # optical flow 
        
           of_condition = None 
        
           if condition_tensor is not None: 
        
               assert condition_tensor is not None, "need optical flow condition" 
        
               for v in condition_tensor: 
        
                   if v["type"] == "optical_flow": 
        
                       of_condition = v 
        
           optical_flow_transform_loss_weight = None 
        
           if "optical_flow_transform_loss_weight" in self.options: 
        
               optical_flow_transform_loss_weight = self.options["optical_flow_transform_loss_weight"] 
        
           if optical_flow_transform_loss_weight is None: 
        
               if of_condition is not None and "keypoint_transform_loss_weight" in self.options: 
        
                   optical_flow_transform_loss_weight = self.options["keypoint_transform_loss_weight"] 
        
           optical_flow_strength_loss_weight = None 
        
           if "optical_flow_strength_loss_weight" in self.options: 
        
               optical_flow_strength_loss_weight = self.options["optical_flow_strength_loss_weight"] 
        
           if ptu.default_phase() == pt.Phase.train and \ 
        
                   (rbool(optical_flow_transform_loss_weight) or rbool(optical_flow_strength_loss_weight)): 
        
               assert of_condition is not None, "need optical flow condition" 
        
               # coordinate before padding 
        
               pre_keypoint_param = keypoint_param[:, :, :2] 
        
               scaling_factor = np.array(self.target_input_size) / np.array(self.input_size) 
        
               pre_keypoint_param = keypoints_2d.scale_keypoint_param( 
        
                   pre_keypoint_param, scaling_factor, src_aspect_ratio=full_a) 
        
               # only use valid 
        
               ind_offset = tf.reshape(of_condition["offset"], [-1]) 
        
               flow_map = of_condition["flow"]  # [batch_size, h, w, 2] 
        
               valid_mask = tf.not_equal(ind_offset, 0) 
        
               # interpolation mask 
        
               flow_h, flow_w = tmf.get_shape(flow_map)[1:3] 
        
               if rbool(optical_flow_transform_loss_weight): 
        
                   pre_interp_weights = keypoints_2d.gaussian_coordinate_to_keypoint_map(tf.concat([ 
        
                       pre_keypoint_param, 
        
                       tf.ones_like(pre_keypoint_param[:, :, -1:]) / math.sqrt(flow_h * flow_w) 
        
                   ], axis=2), km_h=flow_h, km_w=flow_w)  # [batch_size, h, w, keypoint_num] 
        
                   pre_interp_weights /= tf.reduce_sum(pre_interp_weights, axis=[1, 2], keep_dims=True) + tmf.epsilon 
        
                   # pointwise flow 
        
                   next_ind = np.arange(batch_size) + ind_offset 
        
                   next_keypoint_param = tf.gather(pre_keypoint_param, next_ind) 
        
                   pointwise_flow = tf.reduce_sum( 
        
                       tf.expand_dims(flow_map, axis=3)*tf.expand_dims(pre_interp_weights, axis=4), 
        
                       axis=[1, 2] 
        
                   ) 
        
                   # flow transform constraint 
        
                   next_keypoint_param_2 = pre_keypoint_param + pointwise_flow 
        
                   kp_of_trans_loss = tf.reduce_mean(tf.boolean_mask( 
        
                       tmf.sum_per_sample(tf.square(next_keypoint_param_2 - next_keypoint_param)), 
        
                       mask=valid_mask 
        
                   )) 
        
                   optical_flow_transform_loss = kp_of_trans_loss * optical_flow_transform_loss_weight 
        
                   tgu.add_to_aux_loss(optical_flow_transform_loss, "flow_trans") 
        
               if rbool(optical_flow_strength_loss_weight): 
        
                   pre_interp_weights = keypoints_2d.gaussian_coordinate_to_keypoint_map(tf.concat([ 
        
                       pre_keypoint_param, 
        
                       tf.ones_like(pre_keypoint_param[:, :, -1:]) * (1/16)  #self.base_gaussian_stddev 
        
                   ], axis=2), km_h=flow_h, km_w=flow_w)  # [batch_size, h, w, keypoint_num] 
        
                   pre_interp_weights /= tf.reduce_sum(pre_interp_weights, axis=[1, 2], keep_dims=True) + tmf.epsilon 
        
                   kp_of_strength_loss = tf.reduce_mean(tmf.sum_per_sample( 
        
                       tf.boolean_mask(pre_interp_weights, mask=valid_mask) * 
        
                       tf.sqrt(tf.reduce_sum( 
        
                           tf.square(tf.boolean_mask(flow_map, mask=valid_mask)), axis=3, keep_dims=True)) 
        
                   )) 
        
                   # kp_of_strength_loss = 1/(kp_of_strength_loss+1) 
        
                   kp_of_strength_loss = -kp_of_strength_loss 
        
                   optical_flow_strength_loss = kp_of_strength_loss * optical_flow_strength_loss_weight 
        
                   tgu.add_to_aux_loss(optical_flow_strength_loss, "flow_strength")

However, we have not released the code for the data loading and (OpenCV based) optical flow computation for Human3.6M. We plan to do that soon.

hliuav · 2018-08-10T07:52:56Z

Thank you for your quick reply. I also find that if the network are trained with pictures with background, the landmarks tend to form a circle and each landmark only varies a little in its local region. Most of the cases shown in the paper are also trained with the images of similar pose(car, animals etc.) Only human3.6M dataset has various of poses. Is that the reason why we need to extract the background of the human3.6M dataset, that is, to make sure the network won't learn landmarks from background?(I have tried to train with human3.6M dataset with background, the network almost learn nothing)

YutingZhang · 2018-08-25T19:25:09Z

Sorry for the delayed response due to my recent job transition.
The method is not robust to background variations for human body images (though it works for faces).
The human body is more complicated than other objects regarding the pose variation and the viewpoint of interest. So the foreground object structure is also harder to capture. I think that is why an easier background is needed.

ender1001 · 2018-08-28T15:54:52Z

Thank you for your great job. It really helps me a lot in my current work. I have encountered a similar problem in the background. Actually, I have extracted only the foreground from a video, but the method still recognized part of the foreground as background, hence missing some important landmarks. I am wondering if I can turn off the background channel in both encoding and decoding. I found some related options in your code but failed to enable them. Do you have any suggestions? Thank you.

jojolee123 · 2022-07-21T02:05:43Z

Thank you for you nice work!
Can you provide a download link of Simplified Human3.6M dataset & Human3.6M dataset?
Waiting for your relay, thanks！

YutingZhang · 2022-07-31T18:19:33Z

Just added a Google Drive link in the readmo: https://github.com/YutingZhang/lmdis-rep/ Thanks!

…

On Wed, Jul 20, 2022 at 10:05 PM jojolee123 ***@***.***> wrote: Thank you for you nice work! Can you provide a download link of Simplified Human3.6M dataset & Human3.6M dataset? Waiting for your relay, thanks！ — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4AML7S7EFZOPZ444E2ZETVVCWADANCNFSM4FO64RBA> . You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on Human3.6M dataset #2

Training on Human3.6M dataset #2

hliuav commented Aug 10, 2018

YutingZhang commented Aug 10, 2018

hliuav commented Aug 10, 2018

YutingZhang commented Aug 25, 2018

ender1001 commented Aug 28, 2018

jojolee123 commented Jul 21, 2022

YutingZhang commented Jul 31, 2022 via email

Training on Human3.6M dataset #2

Training on Human3.6M dataset #2

Comments

hliuav commented Aug 10, 2018

YutingZhang commented Aug 10, 2018

hliuav commented Aug 10, 2018

YutingZhang commented Aug 25, 2018

ender1001 commented Aug 28, 2018

jojolee123 commented Jul 21, 2022

YutingZhang commented Jul 31, 2022 via email