Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
wangbluo committed Oct 7, 2024
1 parent 765db38 commit ceadef3
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/source/en/features/sequence_parallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ Currently, the `MoeHybridParallelPlugin` only supports DeepSpeed-Ulysses sequenc
### Conclusion
Among the sequence parallelism methods mentioned, both ring attention and Ulysses have their pros and cons, and we need to choose the appropriate sequence parallelism method based on the situation:

Communication: Ulysses has lower communication overhead compared to ring attention, as it primarily involves three All-to-All communication ops, whereas the communication cost of ring attention grows quadratically with the sequence length. However, on the other hand, All-to-All op also demands more bandwidth from the hardware.
Communication: Ulysses has lower communication overhead compared to ring attention, as it primarily involves three All-to-All communication ops, whereas the communication cost of ring attention grows quadratically with the sequence length. However, on the other hand, All-to-All op also demands dense network topologies, e.g. NVLink + NVSwitch, so it doesn't scale well across multiple nodes.

Memory usage: Both are similar in terms of memory consumption.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/zh-Hans/features/sequence_parallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ for step, batch in enumerate(tqdm(dataloader, desc="Step", disable=not dist.get_
### 结论
在上述序列并行方法中,ring attention和Ulysses各有优劣,我们需要根据情况来选择合适的序列并行方法:

通信方面:Ulysses通信量优于ring attention,Ulysess主要包含三次All2All通信量,而ring attention的通信会随着序列长度增长而平方增长。不过另一方面,all2all对底层硬件的要求也会更高
通信方面:Ulysses通信量优于ring attention,Ulysess主要包含三次All2All通信量,而ring attention的通信会随着序列长度增长而平方增长。不过另一方面,all2all op由于需要更复杂的网络拓扑,例如NVLink和NVSwitch,因此在多机情况时,并不会随着机器数量增加而有较好的性能提升

内存占用:二者类似。

Expand Down

0 comments on commit ceadef3

Please sign in to comment.