perf(MPIArray): Write redistributed chunks directly into target array #214

ljgray · 2022-09-06T17:32:18Z

Reduce memory usage and remove an extra step from redistribute. This offers some performance improvements when the number of ranks is small but MPI overhead dominates at larger scales.

jrs65

Great!

There's another memory optimisation I've been thinking about for a while which would get rid of the full third copy (contained within buffers). We could do the message passing in stages, i.e. round 1 each rank passes to its (cyclic) neighbour, waits on the results and moves into the final location, round two it passes to its next-but-one neighbour, etc. Then you don't need the full array buffer. It may be a bit slower in terms of the network transfer, but these redistributions are one of the biggest unpredictable eaters of memory, and if we can get them down we can probably run on fewer nodes at once.

ljgray requested a review from jrs65 September 6, 2022 17:32

perf(MPIArray): Write redistributed chunks directly into target array

b107a47

ljgray force-pushed the ljg/mpiarray-memory branch from 3788686 to b107a47 Compare September 6, 2022 17:33

jrs65 approved these changes Sep 6, 2022

View reviewed changes

ljgray merged commit 824b054 into master Sep 6, 2022

ljgray deleted the ljg/mpiarray-memory branch September 6, 2022 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(MPIArray): Write redistributed chunks directly into target array #214

perf(MPIArray): Write redistributed chunks directly into target array #214

ljgray commented Sep 6, 2022

jrs65 left a comment

perf(MPIArray): Write redistributed chunks directly into target array #214

perf(MPIArray): Write redistributed chunks directly into target array #214

Conversation

ljgray commented Sep 6, 2022

jrs65 left a comment

Choose a reason for hiding this comment