Comparison with expandable_segments in pytorch/c10? #12

YouJiacheng · 2024-01-02T12:13:02Z

https://github.com/pytorch/pytorch/blob/95a86ed9ca107329151e0dc172386d50dd3471c6/c10/cuda/CUDACachingAllocator.cpp#L311-L324

The expandable_segments:True option is used to enable/disable this behavior. We
use cuda's low-level memory APIs, which are similar to mmap, to extend the
memory segments. These APIs separate the allocation of physical memory
(cuMemCreate) from the allocation of virtual address space (cuMemAddressReserve)
and the associate between them cuMemMap/cuMemSetAccess.

When we allocate a new segment, we allocate enough address space to map
basically the entire physical memory of the GPU (there is 256TiB of address
space), but we only map enough physical memory to handle the current amount of
memory needed by the program. As more is requested, we add more physical memory
to the segment. This can work at the granularity of GPU pages which are 2MiB
currently.

ruizhang1230 · 2024-01-08T06:47:35Z

Thank you for your interest in our work. GMLake was implemented before April 2023. Our work was originally completed on the PyTorch-1.13.1. After PyTorch-2.0 was released, we adapted our work to the 2.0 version. All of the experiments were conducted on the PyTorch-2.0. However, the expandable_segments was introduced in version 2.1, we have not yet conducted more detailed experiments with this feature. In recent days, we have conducted an in-depth investigation of the implementation of expandable_segment. As mentioned in the code comments, this feature primarily addresses the issue of increasing block size, whereas we address the problem of fragmentation, which is not the same. We have adapted our work to PyTorch 2.1 and conducted a simple comparative test on this feature. On the GPT-NeoX-20B model, the memory utilization rate of the expandable_segment feature was 87%, while for GMLake it was 95%. Expandable_segment is a very good work, and we plan to conduct a detailed analysis of this feature on a variety of models.

If you would like to have a deep talk, please leave an email address, and we will send you our contact information.

YouJiacheng · 2024-01-08T09:34:05Z

Thank you for your informative reply. I believe GMLake and expandable_segment are concurrent works, but the mentioned PR introducing expandable_segment is dated Mar 17, 2023 (but released in 2.1).

The purpose of increasing segment size should be to eliminate fragmentation. Theoretically there can be no fragmentation (except intra page fragmentation) with expandable_segment, tensors can always be successfully allocated as long as there are enough spare pages, regardless of whether they are contiguous physically.

Stitching is naturally performed since the allocation of physical memory is separated from the allocation of virtual address space the associate between them.

eedalong · 2024-02-18T07:04:56Z

The skills used behind should be roughly same: manually managing virtual memory and physical memory mappinp.

YouJiacheng changed the title ~~Comparison with expandable_segments in pytorch/c10~~ Comparison with expandable_segments in pytorch/c10? Jan 2, 2024

dream110fly mentioned this issue Sep 19, 2024

question about torch 2.1.0 integration #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with expandable_segments in pytorch/c10? #12

Comparison with expandable_segments in pytorch/c10? #12

YouJiacheng commented Jan 2, 2024

ruizhang1230 commented Jan 8, 2024

YouJiacheng commented Jan 8, 2024

eedalong commented Feb 18, 2024

Comparison with expandable_segments in pytorch/c10? #12

Comparison with expandable_segments in pytorch/c10? #12

Comments

YouJiacheng commented Jan 2, 2024

ruizhang1230 commented Jan 8, 2024

YouJiacheng commented Jan 8, 2024

eedalong commented Feb 18, 2024