[QST] GPU Memory is completely consumed in AWS-EMR #10827

akmalmasud96 · 2024-05-16T17:24:06Z

I am trying to run Spark-Rapids with AWS-EMR. I am facing a problem that the GPU memory is completely consumed when the processing is initiated. And there is no space left in the GPU memory to perform processing.
Attached is an image, showing this phenomenon.

The AWS-EMR version is 7.1.0.
I want to ask that how can I solve this problem ?

akmalmasud96 · 2024-05-16T18:27:37Z

The issue was due to configuring the following parameter
"spark.plugins":"com.nvidia.spark.SQLPlugin"
Removing it, resolved the issue.

abellina · 2024-05-16T18:31:12Z

@akmalmasud96 just making sure I understand, do you want to run the the spark-rapids plugin? If so the spark.plugins flag is required. Thanks!

jlowe · 2024-05-16T20:54:38Z

The RAPIDS Accelerator consumes almost all of the GPU memory by default. It does not expect to share the GPU with another process. You can configure the RAPIDS Accelerator to consume much less GPU memory, although that often has detrimental effects on performance due to extra spilling to fit within the smaller amount of GPU memory. See the spark.rapids.memory.gpu.maxAllocFraction or spark.rapids.memory.gpu.reserve configs to limit the amount of memory used, either by setting a maximum fraction or the amount of memory to leave reserved, respectively.

akmalmasud96 · 2024-05-17T06:13:01Z

@abellina , @jlowe ,Thanks for describing. I had figured it out. I want to ask that can we use Raft in spark-rapids for vector operations ?

jlowe · 2024-05-22T19:50:04Z

Apologies for the delay, I accidentally missed this. Working with RAPIDS RAFT should be possible, but details will depend on how you're planning on using it from Spark. I assume this is via a UDF, so are you planning on a Python UDF calling the RAPIDS RAFT Python APIs or something lower-level like the C++ RAFT APIs via a Java UDF that allows the data to stay in the JVM process? Python UDF will require sharing the GPU across processes, which is a bit tricky as mentioned above.

akmalmasud96 added ? - Needs Triage Need team to review and classify question Further information is requested labels May 16, 2024

akmalmasud96 closed this as completed May 16, 2024

sameerz removed the ? - Needs Triage Need team to review and classify label May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] GPU Memory is completely consumed in AWS-EMR #10827

[QST] GPU Memory is completely consumed in AWS-EMR #10827

akmalmasud96 commented May 16, 2024

akmalmasud96 commented May 16, 2024

abellina commented May 16, 2024

jlowe commented May 16, 2024

akmalmasud96 commented May 17, 2024 •

edited

Loading

jlowe commented May 22, 2024

[QST] GPU Memory is completely consumed in AWS-EMR #10827

[QST] GPU Memory is completely consumed in AWS-EMR #10827

Comments

akmalmasud96 commented May 16, 2024

akmalmasud96 commented May 16, 2024

abellina commented May 16, 2024

jlowe commented May 16, 2024

akmalmasud96 commented May 17, 2024 • edited Loading

jlowe commented May 22, 2024

akmalmasud96 commented May 17, 2024 •

edited

Loading