Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reduce GPU to CPU copies at inference #3127

Merged
merged 5 commits into from
Sep 7, 2022
Merged

fix: Reduce GPU to CPU copies at inference #3127

merged 5 commits into from
Sep 7, 2022

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Aug 31, 2022

Related Issues

Proposed Changes:

Implemented the recommended fix described in the issue. Basically, we perform one GPU to CPU copy of entire matrices instead of individually copying over elements of the matrix.

How did you test it?

Notes for the reviewer

Checklist

@sjrl sjrl marked this pull request as ready for review August 31, 2022 13:20
@sjrl sjrl requested a review from a team as a code owner August 31, 2022 13:20
@sjrl sjrl requested review from bogdankostic and removed request for a team August 31, 2022 13:20
@sjrl
Copy link
Contributor Author

sjrl commented Sep 5, 2022

Comment: #2926 (comment) provides timings for how these changes improve the speed of the reader predictions when using large values ~50 for top_k_per_sample.

@sjrl
Copy link
Contributor Author

sjrl commented Sep 5, 2022

Hi, @bogdankostic I want to let you know that this PR is ready for review. The comment referenced above shows how the changes improve the speed of the reader when using large (e.g. 50) values for top_k_per_sample. There are no more obvious speedups that I can see and if we need to we can look at further optimizing this function in the future.

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, these speed improvements seem impressive 🚀 LGTM!

@sjrl sjrl merged commit 62e7c19 into main Sep 7, 2022
@sjrl sjrl deleted the issue-2926 branch September 7, 2022 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy less from GPU to memory when postprocessing QA logits
2 participants