Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VisionLanguageFusionModule #52

Open
lzl2040 opened this issue Mar 6, 2024 · 1 comment
Open

VisionLanguageFusionModule #52

lzl2040 opened this issue Mar 6, 2024 · 1 comment

Comments

@lzl2040
Copy link

lzl2040 commented Mar 6, 2024

I find in this module you use multiplication instead of addition or concatenation.

tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos),
                                   key=self.with_pos_embed(memory, pos),
                                   value=memory, attn_mask=None,
                                   key_padding_mask=memory_key_padding_mask)[0]
        tgt = tgt * tgt2
        return tgt

why not use addition or concatenation? What are the benefits of using multiplication?

@wjn922
Copy link
Owner

wjn922 commented Mar 12, 2024

In our early experiments, we find that multiplication would have higher performance than addition for RVOS. So we consistently use multiplication all along the work.

But actually, the performance advantage is less than 1 point. So using multiplication, addition or concatenation may not significantly impact the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants