[FEA] Metric for maximum GPU memory per task #6745

abellina · 2022-10-10T15:01:41Z

The maximum amount of GPU memory each task uses is a very helpful metric to know if an application is getting close to needing to spill or not.

Tracking the memory currently on the GPU, or spilled to host memory, etc is also really interesting.

The problem is how to gather this metric in an efficient way. The Retry framework could keep track of the amount of memory that is allocated on a given thread, and the amount that is also deallocated/freed by that thread. It would not take into account the memory that is then freed by other threads (like in the case of spill, or UCX shuffle). Instead we would almost want to associate each allocation with a given thread, but that can be very memory intensive on the host, especially because we are likely to see thousands of buffers active.

We should experiment to see how expensive this is in practice and if it is not too bad implement it.

abellina · 2022-10-13T16:22:30Z

Thought about this issue a bit more, what I think we want is a version of the tracking_resource_adaptor but, rather than have a single map for all threads, I think that we want to keep track of the maximum outstanding GPU footprint per thread. Also to note, the main motivation here would be to figure out if our estimation on memory usage for some GPU code is higher than anticipated, to help us debug waste or inform heuristics to control what tasks we allow on the GPU.

This should allow us to do the following:

val maxOutstandingUsage = withMemoryTracking { 
  val gpuData = materialize data on gpu
  val result = withResource(gpuData) { _.callCudfFunction }
  result.close() 
    // at this point our maximum outstanding should be:
    // gpuData + max(allocated) inside of `callCudfFunction`
}

In this scenario when we enter the withMemoryTracking block, we would ask a per-thread tracking resource to start tracking this thread before we materialize data. The materialization of gpuData incurs calls to rmm to get memory, so that adds to the outstanding amount, and then the call to the cuDF code could be allocations that are kept around (outstanding) for a while, allocations and frees that happen within the C++ code before the kernel, or results from this code. So we can keep track of how much is outstanding at any given time by adding to a thread-local variable how many bytes have been requested, and subtracting when we call free.

If one of our allocations failed and we handled them via a spill it shouldn't matter. That is because the spill code should be careful to disable the tracking for those spills (e.g. a withoutMemoryTracking call). This means we wouldn't discount frees in this thread for some other thread's allocations that are irrelevant to the code being tracked.

I hope/believe this could be a pretty low overhead system. Note this doesn't, I don't think, help tracking when an expensive kernel is loaded, as far as I understand that can be a one-time-penalty when we open the shared library. I know we have seen this with some of the regular expression kernels in the past. Pinging @jlowe on this overall for comments.

abellina · 2022-10-17T13:31:36Z

I think one approach here is to have a stack of simple memory tracking info in RmmJni. When a withMemoryTracking block is issued we push to the stack one of these objects. The tracking_resource_adaptor could then check this stack for the current thread, and if it has something in it, it uses the top tracker to track allocations for now.

When withMemoryTracking is finishing, it calls a function in the RMM jni bits to pop this element from the stack. If it is the last element, we have turned the feature off. If it is not the last element we get the amount tracked in this scope and add the maximum outstanding we just popped to the next element in the stack (the calling scope also saw that maximum outstanding), and we continue to track with the remaining tracker in the stack.

We also need to keep a set of addresses we allocated in this thread, unfortunately. Given spill, the current thread may need to spill to satisfy an allocation. It seems we could ignore frees that we didn't allocate while tracking. The hope is that these withMemoryTracking blocks are as close as possible to a cuDF call.

abellina · 2023-02-10T17:08:24Z

Nsys has added memory tracking capabilities as of late, and we believe we can use the correlationId + NVTX ranges to accomplish this as a post processing step given an NVTX range. We should investigate if this solution does what we need.

wjxiz1992 · 2023-10-25T09:51:19Z

Hi @abellina I am trying to profile the GPU memory usage during a query run. I used nsys to profile, but didn't find metrics like peak memory usage

I was using NVIDIA Nsight Systems version 2022.2.1.31-5fe97ab installed in our internal cluster.
I saw a post about it: https://forums.developer.nvidia.com/t/nsys-measure-memory/118394 which is posted on 2021, but it contains the memory usage part in the graph...

Update:
The memory usage metrics are disabled by default, it can be turned on by an extra nsys argument --cuda-memory-usage=true
Then we can see the memory utilization part in the graph:

abellina · 2023-10-27T15:07:56Z

I haven't used this feature, the main question I'd have is whether it works with a pool, especially the async pools. It most definitely does not work with ARENA because that's all CPU managed, but cudaAsync I'd hope shows it.

wjxiz1992 · 2023-10-30T05:52:48Z

The profile result above is from a run with ASYNC pool.

abellina added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Oct 10, 2022

abellina mentioned this issue Oct 10, 2022

[TASK] Run without fatal OOMs #6746

Closed

10 tasks

abellina added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Oct 10, 2022

sameerz removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Oct 11, 2022

abellina mentioned this issue Oct 13, 2022

[FEA] Investigate how to handle memory explosion #6785

Closed

mattahrens assigned abellina Oct 19, 2022

abellina mentioned this issue Oct 20, 2022

[FEA][JNI] Track GPU memory usage at process and at local level rapidsai/cudf#11949

Closed

abellina mentioned this issue Dec 9, 2022

[FEA] Find and un-nest withResource where appropriate #6758

Closed

abellina removed their assignment Feb 10, 2023

abellina changed the title ~~[FEA] Track held device memory per thread~~ [FEA] Track held device memory per thread (using nsys?) Feb 10, 2023

revans2 mentioned this issue Apr 4, 2023

[FEA] Spill and Retry Metrics #8027

Open

4 tasks

revans2 changed the title ~~[FEA] Track held device memory per thread (using nsys?)~~ [FEA] Metric for maximum GPU memory per task Apr 4, 2023

mattahrens mentioned this issue Oct 17, 2023

[FEA] Figure out how to monitor GPU memory usage during scale test #9448

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Metric for maximum GPU memory per task #6745

[FEA] Metric for maximum GPU memory per task #6745

abellina commented Oct 10, 2022 •

edited by revans2

Loading

abellina commented Oct 13, 2022 •

edited

Loading

abellina commented Oct 17, 2022 •

edited

Loading

abellina commented Feb 10, 2023

wjxiz1992 commented Oct 25, 2023 •

edited

Loading

abellina commented Oct 27, 2023

wjxiz1992 commented Oct 30, 2023

[FEA] Metric for maximum GPU memory per task #6745

[FEA] Metric for maximum GPU memory per task #6745

Comments

abellina commented Oct 10, 2022 • edited by revans2 Loading

abellina commented Oct 13, 2022 • edited Loading

abellina commented Oct 17, 2022 • edited Loading

abellina commented Feb 10, 2023

wjxiz1992 commented Oct 25, 2023 • edited Loading

abellina commented Oct 27, 2023

wjxiz1992 commented Oct 30, 2023

abellina commented Oct 10, 2022 •

edited by revans2

Loading

abellina commented Oct 13, 2022 •

edited

Loading

abellina commented Oct 17, 2022 •

edited

Loading

wjxiz1992 commented Oct 25, 2023 •

edited

Loading