Improve host memory spill interfaces #10065

jbrennan333 · 2023-12-17T18:43:25Z

This fixes #10004

This changes the host memory spill to look more like SpillableColumnarBatch. Instead of using withHostMemoryReadOnly and withHostMemoryWriteLock, we can instead just the SpillableHostBuffer.getHostBuffer().

This also changes RapidsDiskBuffer.getMemoryBuffer to no longer retain a HostMemoryBuffer, and to use HostAlloc.alloc() for obtaining the HostMemoryBuffer it returns. This may require callers to handle retries.

I have also made changes to InternalRowToColumnarBatchIterator to use this new interface.

I am putting this up as a draft so I can get some feedback on the approach.

Signed-off-by: Jim Brennan <jimb@nvidia.com>

jbrennan333 · 2023-12-17T18:44:16Z

build

jbrennan333 · 2023-12-18T15:05:51Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

+      ) {
+        used = fillBatch(dataBuffer, offsetsBuffer, dataLength, numRowsEstimate);
+      }
+      hBufs = getHostBuffersWithRetry(sdb, sob);


I'm not certain there is a good reason to have a second block here. I did it this way as a test, because previously I was hitting problems with the with-write-lock followed by with-read-only blocks. But at this point I think I could just combine these two blocks.

jbrennan333 · 2023-12-18T15:07:52Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala

+      case Some(existingBuffer) => existingBuffer
+      case None =>
+        val maybeNewBuffer = hostStorage.copyBuffer(buffer, this, stream)
+        maybeNewBuffer.map { newBuffer =>


This buffer is initially created as spillable. But it changes to unspillable when the caller does a getHostMemoryBuffer on it. Not sure if I should be concerned about this brief window of spillability?

jbrennan333 · 2023-12-18T15:09:56Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala

+        maybeNewBuffer.map { newBuffer =>
+          logDebug(s"got new RapidsHostMemoryStore buffer ${newBuffer.id}")
+          newBuffer.addReference() // add a reference since we are about to use it
+          updateTiers(BufferSpill(buffer, Some(newBuffer)))


The debug log output from update tiers make this sound like a spill, where in fact this is an unspill. Perhaps I should modify updateTiers to compare the storageTiers and change the log appropriately.

this was created only with spill in mind, not unspill. Could BufferSpill be part of a hierarchy of classes:

SpillAction and BufferSpill is a SpillAction and BufferUnspill is also a SpillAction?

jbrennan333 · 2023-12-18T15:11:54Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDiskStore.scala

+      val serializerManager = diskBlockManager.getSerializerManager()
+      val memBuffer = if (serializerManager.isRapidsSpill(id)) {
+        // Only go through serializerManager's stream wrapper for spill case
+          closeOnExcept(HostAlloc.alloc(uncompressedSize)) {


Adding this HostAlloc.alloc() is what really increases how much heap memory I can get away with when running queries. This also requires that any code that ends up in here will likely need a retry.

jbrennan333 · 2023-12-18T15:14:13Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsHostMemoryStore.scala

-                    case _ =>
-                      throw new IllegalStateException("copying from buffer without device memory")
+      // If the other is from the local disk store, we are unspilling to host memory.
+      if (other.storageTier == StorageTier.DISK) {


I originally tested this without this block for StorageTier.DISK, and it worked.
But in that case we do a HostAlloc here and another one in the disk store, and then copy the buffer to this one. I changed it to just take over the buffer from the disk store to eliminate the extra alloc/copy.

jbrennan333 · 2023-12-18T15:17:58Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala

+      rapidsBuffer.getHostMemoryBuffer
+    }
+  }
+
  /**


I did not remove withHostBufferReadOnly/withHostBufferWriteLock yet, but I don't think we need them if we switch to this alternate approach?

I'd remove them if they are not useful

jbrennan333 · 2023-12-18T15:45:36Z

I have tested this by running existing unit and integration tests, and I have also been using nds queries to test this.
By setting these configs:

 --conf spark.rapids.sql.exec.ShuffleExchangeExec=false \
 --conf spark.rapids.memory.host.offHeapLimit.enabled=true \
 --conf spark.rapids.memory.host.offHeapLimit.size=32G \

I am able to force a lot of InternalRowToColumnarBatchIterator activity. I have verified that I can run the full nds power run and get correct results with memory restricted to 32G. Ideally we should be able to go lower, but I am going to leave that as a separate investigation.
I have also done a benchmark test on an 8-node A100 cluster to verify that there are no performance impacts from this change (this was done without limiting off-heap memory).

abellina · 2023-12-18T20:12:12Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala

+          updateTiers(BufferSpill(buffer, Some(newBuffer)))
+          buffer.safeFree()
+          newBuffer
+        }.get // the GPU store has to return a buffer here for now, or throw OOM


Suggested change

}.get // the GPU store has to return a buffer here for now, or throw OOM

}.get // the host store has to return a buffer here for now, or throw OOM

abellina · 2023-12-18T20:21:47Z

This LGTM @jbrennan333

gerashegalov · 2023-12-18T20:53:18Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

-    try (
-        SpillableHostBuffer sdb = bufsAndNumRows._1[0];
+    int used[];
+    try (SpillableHostBuffer sdb = bufsAndNumRows._1[0];


Not introduced by this PR but since it touches them, can we update these variables to be more mnemonic than sdb and sob to improve code readability

jbrennan333 · 2023-12-18T23:13:39Z

build

gerashegalov

still trying to understand big picture, just nits

gerashegalov · 2023-12-18T23:57:22Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

@@ -208,6 +209,23 @@ public ColumnarBatch next() {
    }
  }

+  private HostMemoryBuffer[] getHostBuffersWithRetry(SpillableHostBuffer sdb, SpillableHostBuffer sob) {


Suggested change

private HostMemoryBuffer[] getHostBuffersWithRetry(SpillableHostBuffer sdb, SpillableHostBuffer sob) {

private HostMemoryBuffer[] getHostBuffersWithRetry(SpillableHostBuffer spillableDataBuffer, SpillableHostBuffer spillableOffsetsBuffer) {

gerashegalov · 2023-12-19T04:02:21Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/InternalRowToColumnarBatchIterator.java

+      HostMemoryBuffer[] hBufs = new HostMemoryBuffer[]{ null, null };
+      try {
+        hBufs[0] = sdb.getHostBuffer();
+        hBufs[1] = sob.getHostBuffer();
+        return hBufs;
+      } finally {
+        // If the second buffer is null, we must have thrown, so close the first one.
+        if ((hBufs[1] == null) && (hBufs[0] != null)) {
+          hBufs[0].close();
+          hBufs[0] = null;
+        }
+      }


Do we have to test for the exception implicitly, would this also work?

Suggested change

HostMemoryBuffer[] hBufs = new HostMemoryBuffer[]{ null, null };

try {

hBufs[0] = sdb.getHostBuffer();

hBufs[1] = sob.getHostBuffer();

return hBufs;

} finally {

// If the second buffer is null, we must have thrown, so close the first one.

if ((hBufs[1] == null) && (hBufs[0] != null)) {

hBufs[0].close();

hBufs[0] = null;

}

}

HostMemoryBuffer dataBuffer = spillableDataBuffer.getHostBuffer();

HostMemoryBuffer offsetsBuffer = null;

try {

offsetsBuffer = spillableOffestBuffer.getHostBuffer();

} catch (Throwable t) {

dataBuffer.close();

}

return new HostMemoryBuffer[] {dataBuffer, offsetsBuffer};

Thanks @gerashegalov. I have simplified this by just doing a try-with-resources for both buffers and incrementing the refcounts in the body so they are retained if didn't throw.

jbrennan333 · 2023-12-19T15:38:32Z

build

jbrennan333 added 2 commits December 14, 2023 14:51

Initial version

67dddae

Signed-off-by: Jim Brennan <jimb@nvidia.com>

Change disk store to use HostAlloc.alloc

05c2383

jbrennan333 added bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Dec 17, 2023

jbrennan333 self-assigned this Dec 17, 2023

jbrennan333 commented Dec 18, 2023

View reviewed changes

abellina self-requested a review December 18, 2023 15:18

abellina reviewed Dec 18, 2023

View reviewed changes

gerashegalov reviewed Dec 18, 2023

View reviewed changes

jbrennan333 added 2 commits December 18, 2023 15:03

Merge branch 'branch-24.02' into jtb-spill

d9b4413

Address review comments

df46a5b

jbrennan333 marked this pull request as ready for review December 18, 2023 23:13

gerashegalov reviewed Dec 19, 2023

View reviewed changes

Simplify getHostBuffersWithRetry()

1493b30

gerashegalov approved these changes Dec 19, 2023

View reviewed changes

jbrennan333 merged commit 2edf82d into NVIDIA:branch-24.02 Dec 19, 2023
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve host memory spill interfaces #10065

Improve host memory spill interfaces #10065

jbrennan333 commented Dec 17, 2023

jbrennan333 commented Dec 17, 2023

jbrennan333 Dec 18, 2023

jbrennan333 Dec 18, 2023

jbrennan333 Dec 18, 2023

abellina Dec 18, 2023

jbrennan333 Dec 18, 2023

jbrennan333 Dec 18, 2023

jbrennan333 Dec 18, 2023

abellina Dec 18, 2023

jbrennan333 commented Dec 18, 2023

abellina Dec 18, 2023

abellina commented Dec 18, 2023

gerashegalov Dec 18, 2023

jbrennan333 commented Dec 18, 2023

gerashegalov left a comment

gerashegalov Dec 18, 2023

gerashegalov Dec 19, 2023

jbrennan333 Dec 19, 2023

jbrennan333 commented Dec 19, 2023

	}.get // the GPU store has to return a buffer here for now, or throw OOM
	}.get // the host store has to return a buffer here for now, or throw OOM

	private HostMemoryBuffer[] getHostBuffersWithRetry(SpillableHostBuffer sdb, SpillableHostBuffer sob) {
	private HostMemoryBuffer[] getHostBuffersWithRetry(SpillableHostBuffer spillableDataBuffer, SpillableHostBuffer spillableOffsetsBuffer) {

Improve host memory spill interfaces #10065

Improve host memory spill interfaces #10065

Conversation

jbrennan333 commented Dec 17, 2023

jbrennan333 commented Dec 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrennan333 commented Dec 18, 2023

Choose a reason for hiding this comment

abellina commented Dec 18, 2023

Choose a reason for hiding this comment

jbrennan333 commented Dec 18, 2023

gerashegalov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrennan333 commented Dec 19, 2023