Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-40932][CORE] Fix issue messages for allGather are overridden
### What changes were proposed in this pull request? The messages returned by allGather may be overridden by the following barrier APIs, eg, ``` scala val messages: Array[String] = context.allGather("ABC") context.barrier() ``` the `messages` may be like Array("", ""), but we're expecting Array("ABC", "ABC") The root cause of this issue is the [messages got by allGather](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala#L102) pointing to the [original message](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala#L107) in the local mode. So when the following barrier APIs changed the messages, then the allGather message will be changed accordingly. Finally, users can't get the correct result. This PR fixed this issue by sending back the cloned messages. ### Why are the changes needed? The bug mentioned in this description may block some external SPARK ML libraries which heavily depend on the spark barrier API to do some synchronization. If the barrier mechanism can't guarantee the correctness of the barrier APIs, it will be a disaster for external SPARK ML libraries. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I added a unit test, with this PR, the unit test can pass Closes apache#38410 from wbo4958/allgather-issue. Authored-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information