Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(MPIArray): Write redistributed chunks directly into target array #214

Merged
merged 1 commit into from
Sep 6, 2022

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Sep 6, 2022

Reduce memory usage and remove an extra step from redistribute. This offers some performance improvements when the number of ranks is small but MPI overhead dominates at larger scales.

@ljgray ljgray requested a review from jrs65 September 6, 2022 17:32
Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

There's another memory optimisation I've been thinking about for a while which would get rid of the full third copy (contained within buffers). We could do the message passing in stages, i.e. round 1 each rank passes to its (cyclic) neighbour, waits on the results and moves into the final location, round two it passes to its next-but-one neighbour, etc. Then you don't need the full array buffer. It may be a bit slower in terms of the network transfer, but these redistributions are one of the biggest unpredictable eaters of memory, and if we can get them down we can probably run on fewer nodes at once.

@ljgray ljgray merged commit 824b054 into master Sep 6, 2022
@ljgray ljgray deleted the ljg/mpiarray-memory branch September 6, 2022 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants