Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of the decoder_attention_mask while evaluating #32

Open
pietrolesci opened this issue Nov 1, 2023 · 1 comment
Open

Creation of the decoder_attention_mask while evaluating #32

pietrolesci opened this issue Nov 1, 2023 · 1 comment

Comments

@pietrolesci
Copy link

pietrolesci commented Nov 1, 2023

Hi there,

I am trying to recreate the decoder attention mask and I am a bit puzzled by how it is created here

decoder_attention_mask = (decoder_input_ids == decoder_input_ids).float()

This creates a dense matrix with 1s everywhere. Shouldn't this be a lower triangular matrix (which is what T5Model does internally by default)?

Thanks a lot for your help!

@dptam
Copy link
Collaborator

dptam commented Nov 21, 2023

The decoder_attention_mask is of the same shape as the input_ids [batch_size, seq_len] to determine which ids are pad_tokens are which are not. The lower triangular matrix is formed later in HuggingFace code of shape [batch_size, seq_len, seq_len].
The log probs of the padded_tokens get masked out later when computing the log prob of the choices, so it doesn't matter if we mask out the pad_tokens in the decoder_attention_mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants