Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspicious performance variation using LB1 in the CUDA-based multi-GPU code #10

Open
Guillaume-Helbecque opened this issue Jul 25, 2024 · 0 comments

Comments

@Guillaume-Helbecque
Copy link
Owner

While testing PR #9, I observed a performance curiosity using LB1 with the CUDA-based multi-GPU code: when a same instance is repeated multiple times, the execution time may vary drastically, as well as the workload per GPUs.

For instance:
Workload per GPU: 39.48 20.65 19.68 20.19 takes 28.2437s
Workload per GPU: 24.26 24.15 26.18 25.41 takes 19.8376s
Workload per GPU: 22.18 21.23 22.93 33.65 takes 42.1932s

As far as I know, this is not the case using LB2, and also not the case using LB1 with the Chapel code. One potential issue could be a bottleneck in the CUDA-based version of the WS mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant