You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prior to inserting buffers into the spillable store, callers should make sure to have synchronized with the GPU ahead of the call.
Update:
After further review it looks like the only issue is adding buffers to the catalog on the write side, after compression. Compression doesn't synchronize at the end, and so we could spill a compressed buffer that isn't fully done. So the fix is only on the code around GpuPartitioning where we finish the batch compressor.
The path below with the SpillableColumnarBatch is covered by the facts that:
Buffers added to the store have to be contiguously split, and contiguous_splitcurrently synchronizes. Note that this is a feature of contiguous_split, and is a byproduct of other other copies that are going on. In general, cudf does not guarantee this.
If the buffer was already contiguously split, it must have come from a shuffle or was previously reconstituted from a spilled buffer. The shuffle is synchronizing on receive, and the reconstituted buffers are synchronously copied back from the host to the device.
If the buffer was compressed, the decompression is asynchronous but that doesn't matter, it happens after a CUDA synchronize.
Previously this seemed to be the issue:
The reason is that as soon as the apply method adds the batch to the catalog, we are at a risk for spilling, which can happen on a different thread than the producer of this batch. Our .next() for ColumnarBatch in a query may produce very likely lazy java representations of memory that are being worked on a CUDA stream asynchronously. Also, given PTDS, those two threads (producer and spiller) could likely be using two different CUDA streams.
I see the following uses:
GpuCoalesceBatches
GpuArrowEvalPythonExec
GpuWindowInPandasExecBase
I don't think SpillableColumnarBatch should be responsible for this, because you want to be able to synchronize once if you are making a collection of batches spillable.
I don't believe this is a hard problem to solve, but I do think it can corrupt data since we could issue a memcpy to host before the device data has fully materialized. Hence, marking this a P1.
The text was updated successfully, but these errors were encountered:
abellina
changed the title
[BUG] Code using SpillableColumnarBatch should synchronize
[BUG] Code adding buffers to the spillable store should synchronize
Dec 2, 2020
Closing this issue. I think the next step in general is to move away from synchronize wherever we can. But that issue is not specific to this, there are many points where we'd like to not stream synchronize if possible and need some design work to account for all cases.
Prior to inserting buffers into the spillable store, callers should make sure to have synchronized with the GPU ahead of the call.
Update:
After further review it looks like the only issue is adding buffers to the catalog on the write side, after compression. Compression doesn't synchronize at the end, and so we could spill a compressed buffer that isn't fully done. So the fix is only on the code around
GpuPartitioning
where we finish the batch compressor.The path below with the
SpillableColumnarBatch
is covered by the facts that:contiguous_split
currently synchronizes. Note that this is a feature ofcontiguous_split
, and is a byproduct of other other copies that are going on. In general, cudf does not guarantee this.Previously this seemed to be the issue:
I don't believe this is a hard problem to solve, but I do think it can corrupt data since we could issue a memcpy to host before the device data has fully materialized. Hence, marking this a P1.
The text was updated successfully, but these errors were encountered: