codebook keeps getting trained during DALLE training #35

CDitzel · 2021-02-10T14:02:13Z

DALLE-pytorch/dalle_pytorch/dalle_pytorch.py

Line 290 in 40f4119

self.image_emb = vae.codebook

right now, neighter an apropritate no_grad call nor manually disabling codebook.requires_grad_(False) prevents the pretrained VAE codebook from getting further adjusted during the subsequent DALLE training procedure.

I am in doube if this is meant to be the case.

Training of the VAE encoder part is rightfully disabled by the associated decorator

DALLE-pytorch/dalle_pytorch/dalle_pytorch.py

Line 122 in 40f4119

@torch.no_grad()

but this does not pertain to the codebook. Maybe I am missing something here? Just wanted to draw the attention to this point

lucidrains · 2021-02-10T15:08:28Z

@CDitzel Good timing! I'm about to get back to work on DALL-E today and tomorrow, going to make the training easy for everyone :)

https://github.com/lucidrains/DALLE-pytorch/releases/tag/0.0.54 I've released a new version where I turn off tying embeddings, and if they are turned on, I detach it properly so it doesn't get trained. Thanks for catching that!

CDitzel · 2021-02-10T15:45:23Z

thank you for attending to this so quickly!

Is a separate embedding for the text and the image tokens even necessary?

I saw similar implementations where they would just concat the tokens and pass them over to a transformer that features only one single nn.Embedding

lucidrains · 2021-02-10T15:58:36Z

yup you can do one single embedding! you would just need to offset one set of tokens by the number in the other

i don't think it matters too much :)

lucidrains · 2021-02-10T15:59:16Z

for now, let's keep it separate, so it could be optionally tied (or not)

CDitzel · 2021-02-10T16:02:54Z

yup you can do one single embedding! you would just need to offset one set of tokens by the number in the other

i don't think it matters too much :)

do you really? I believe one could just index into one and the same embeddings with indices of both modalities even though they span identical integer ranges

lucidrains · 2021-02-10T16:35:42Z

@CDitzel ohh, well, i meant you would do something like nn.Embedding(num_text_tokens + num_image_tokens, dim)

then, when it comes time to retrieve the embedding image_token_ids += num_text_tokens

CDitzel · 2021-02-10T16:39:33Z

yeah I understood what you meant. But I think just using

nn.Embedding(larger_num_of_both_token_len,dim)

and then index into that with both tokens equally even though this means that every so often a text token and an image token could retrieve the same embedding vector

afiaka87 mentioned this issue May 31, 2021

Closing some old issues #262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codebook keeps getting trained during DALLE training #35

codebook keeps getting trained during DALLE training #35

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021

codebook keeps getting trained during DALLE training #35

codebook keeps getting trained during DALLE training #35

Comments

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021

lucidrains commented Feb 10, 2021

CDitzel commented Feb 10, 2021