Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAE prediction visualization code #5

Open
tikboaHIT opened this issue Nov 17, 2021 · 9 comments
Open

MAE prediction visualization code #5

tikboaHIT opened this issue Nov 17, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@tikboaHIT
Copy link
Contributor

Thank you for your contribution. I wonder if you plan to release the mask prediction visualization code?

@pengzhiliang pengzhiliang added the enhancement New feature or request label Nov 18, 2021
@pengzhiliang
Copy link
Owner

Unfortunately, our current visualization code also has some bugs, I will try to solve it!

@tikboaHIT
Copy link
Contributor Author

I have implemented the visualization code here, can I submit a merge request?

@pengzhiliang
Copy link
Owner

Of course. Thank you for your contributions.

And can you provide some visualization results here?

@tikboaHIT
Copy link
Contributor Author

Of course, can you provide pre-trained models and test images? Because the current model is mainly based on a custom datset.

@pengzhiliang
Copy link
Owner

I have uploaded the weight to google drive, please see latest readme.txt.

@avitrost
Copy link

Hi, I was wondering how to perform inference and run the full encoder-decoder network on a complete, unmasked image? In other words, after it is trained, how would I call the model such that it encodes a complete image with no masks, and then reconstructs the original image using the decoder?

@pengzhiliang
Copy link
Owner

pengzhiliang commented Nov 20, 2021

Hello, @avitrost , maybe there are some bugs now when the mask is always 0.
But if you really want to observe the performance when MAE is used as a pure auto-encoder, you should have a change:
For the encoder:
you can directly let x_vis=x in this line.
And for the inputs to decoder:

x_vis = self.encoder_to_decoder(x_vis) # [B, N_vis, C_d]
B, N, C = x_vis.shape
expand_pos_embed = self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()
pos_emd_vis = expand_pos_embed[~mask].reshape(B, -1, C)
pos_emd_mask = expand_pos_embed[mask].reshape(B, -1, C)
x_full = torch.cat([x_vis + pos_emd_vis, self.mask_token + pos_emd_mask], dim=1)

x = self.decoder(x_full, pos_emd_mask.shape[1]) # [B, N_mask, 3 * 16 * 16]

It needs to be changed to:

x_vis = self.encoder_to_decoder(x_vis) # [B, N, C_d]
expand_pos_embed = self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()
x = self.decoder(x_vis+expand_pos_embed , x_vis.shape[1]) # [B, N, 3 * 16 * 16]

Maybe there will be some other bugs, you can have a debug.
Hope this can help you!

@Pter61
Copy link

Pter61 commented Dec 14, 2021

expand_pos_embed = self.pos_embed

Thank you for your contribution! I have a question about this change. Why this change does not update in the latest code?

@mouxinyue1
Copy link

Hello, I would like to ask if the weight file loaded visually is the pre-trained weight file or the fine-tuned weight file? Error when I load the fine-tuned weight file:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants