MAE prediction visualization code #5

tikboaHIT · 2021-11-17T13:14:25Z

Thank you for your contribution. I wonder if you plan to release the mask prediction visualization code?

pengzhiliang · 2021-11-18T03:03:08Z

Unfortunately, our current visualization code also has some bugs, I will try to solve it!

tikboaHIT · 2021-11-18T03:40:07Z

I have implemented the visualization code here, can I submit a merge request?

pengzhiliang · 2021-11-18T04:24:07Z

Of course. Thank you for your contributions.

And can you provide some visualization results here?

tikboaHIT · 2021-11-18T05:59:25Z

Of course, can you provide pre-trained models and test images? Because the current model is mainly based on a custom datset.

pengzhiliang · 2021-11-19T07:28:46Z

I have uploaded the weight to google drive, please see latest readme.txt.

avitrost · 2021-11-19T20:36:59Z

Hi, I was wondering how to perform inference and run the full encoder-decoder network on a complete, unmasked image? In other words, after it is trained, how would I call the model such that it encodes a complete image with no masks, and then reconstructs the original image using the decoder?

pengzhiliang · 2021-11-20T16:54:59Z

Hello, @avitrost , maybe there are some bugs now when the mask is always 0.
But if you really want to observe the performance when MAE is used as a pure auto-encoder, you should have a change:
For the encoder:
you can directly let x_vis=x in this line.
And for the inputs to decoder:

x_vis = self.encoder_to_decoder(x_vis) # [B, N_vis, C_d]
B, N, C = x_vis.shape
expand_pos_embed = self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()
pos_emd_vis = expand_pos_embed[~mask].reshape(B, -1, C)
pos_emd_mask = expand_pos_embed[mask].reshape(B, -1, C)
x_full = torch.cat([x_vis + pos_emd_vis, self.mask_token + pos_emd_mask], dim=1)

x = self.decoder(x_full, pos_emd_mask.shape[1]) # [B, N_mask, 3 * 16 * 16]

It needs to be changed to:

x_vis = self.encoder_to_decoder(x_vis) # [B, N, C_d]
expand_pos_embed = self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()
x = self.decoder(x_vis+expand_pos_embed , x_vis.shape[1]) # [B, N, 3 * 16 * 16]

Maybe there will be some other bugs, you can have a debug.
Hope this can help you!

Pter61 · 2021-12-14T09:54:21Z

expand_pos_embed = self.pos_embed

Thank you for your contribution! I have a question about this change. Why this change does not update in the latest code?

mouxinyue1 · 2022-11-18T07:59:45Z

Hello, I would like to ask if the weight file loaded visually is the pre-trained weight file or the fine-tuned weight file? Error when I load the fine-tuned weight file:

pengzhiliang added the enhancement New feature or request label Nov 18, 2021

pengzhiliang mentioned this issue Nov 19, 2021

Pretrained models #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAE prediction visualization code #5

MAE prediction visualization code #5

tikboaHIT commented Nov 17, 2021

pengzhiliang commented Nov 18, 2021

tikboaHIT commented Nov 18, 2021

pengzhiliang commented Nov 18, 2021

tikboaHIT commented Nov 18, 2021

pengzhiliang commented Nov 19, 2021

avitrost commented Nov 19, 2021

pengzhiliang commented Nov 20, 2021 •

edited

Loading

Pter61 commented Dec 14, 2021

mouxinyue1 commented Nov 18, 2022

MAE prediction visualization code #5

MAE prediction visualization code #5

Comments

tikboaHIT commented Nov 17, 2021

pengzhiliang commented Nov 18, 2021

tikboaHIT commented Nov 18, 2021

pengzhiliang commented Nov 18, 2021

tikboaHIT commented Nov 18, 2021

pengzhiliang commented Nov 19, 2021

avitrost commented Nov 19, 2021

pengzhiliang commented Nov 20, 2021 • edited Loading

Pter61 commented Dec 14, 2021

mouxinyue1 commented Nov 18, 2022

pengzhiliang commented Nov 20, 2021 •

edited

Loading