Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codebook keeps getting trained during DALLE training #35

Open
CDitzel opened this issue Feb 10, 2021 · 7 comments
Open

codebook keeps getting trained during DALLE training #35

CDitzel opened this issue Feb 10, 2021 · 7 comments

Comments

@CDitzel
Copy link

CDitzel commented Feb 10, 2021

self.image_emb = vae.codebook

right now, neighter an apropritate no_grad call nor manually disabling codebook.requires_grad_(False) prevents the pretrained VAE codebook from getting further adjusted during the subsequent DALLE training procedure.

I am in doube if this is meant to be the case.

Training of the VAE encoder part is rightfully disabled by the associated decorator

but this does not pertain to the codebook. Maybe I am missing something here? Just wanted to draw the attention to this point

@lucidrains
Copy link
Owner

@CDitzel Good timing! I'm about to get back to work on DALL-E today and tomorrow, going to make the training easy for everyone :)

https://github.com/lucidrains/DALLE-pytorch/releases/tag/0.0.54 I've released a new version where I turn off tying embeddings, and if they are turned on, I detach it properly so it doesn't get trained. Thanks for catching that!

@CDitzel
Copy link
Author

CDitzel commented Feb 10, 2021

thank you for attending to this so quickly!

Is a separate embedding for the text and the image tokens even necessary?

I saw similar implementations where they would just concat the tokens and pass them over to a transformer that features only one single nn.Embedding

@lucidrains
Copy link
Owner

yup you can do one single embedding! you would just need to offset one set of tokens by the number in the other

i don't think it matters too much :)

@lucidrains
Copy link
Owner

for now, let's keep it separate, so it could be optionally tied (or not)

@CDitzel
Copy link
Author

CDitzel commented Feb 10, 2021

yup you can do one single embedding! you would just need to offset one set of tokens by the number in the other

i don't think it matters too much :)

do you really? I believe one could just index into one and the same embeddings with indices of both modalities even though they span identical integer ranges

@lucidrains
Copy link
Owner

@CDitzel ohh, well, i meant you would do something like nn.Embedding(num_text_tokens + num_image_tokens, dim)

then, when it comes time to retrieve the embedding image_token_ids += num_text_tokens

@CDitzel
Copy link
Author

CDitzel commented Feb 10, 2021

yeah I understood what you meant. But I think just using

nn.Embedding(larger_num_of_both_token_len,dim)

and then index into that with both tokens equally even though this means that every so often a text token and an image token could retrieve the same embedding vector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants