Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf::to_arrow is not properly synchronized #6706

Closed
harrism opened this issue Nov 9, 2020 · 1 comment · Fixed by #6648
Closed

[BUG] cudf::to_arrow is not properly synchronized #6706

harrism opened this issue Nov 9, 2020 · 1 comment · Fixed by #6648
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@harrism
Copy link
Member

harrism commented Nov 9, 2020

cudf::to_arrow copies data from device to host on a stream and never synchronizes that stream before making it available to the Arrow library, which does not use the same stream. Therefore I believe there is a potential race condition.

This is an example of one of the functions in to_arrow.cpp that copies data asynchronously to the host.

template <typename T>
std::shared_ptr<arrow::Buffer> fetch_data_buffer(column_view input_view,
arrow::MemoryPool* ar_mr,
cudaStream_t stream)
{
const int64_t data_size_in_bytes = sizeof(T) * input_view.size();
auto result = arrow::AllocateBuffer(data_size_in_bytes, ar_mr);
CUDF_EXPECTS(result.ok(), "Failed to allocate Arrow buffer for data");
std::shared_ptr<arrow::Buffer> data_buffer = std::move(result.ValueOrDie());
CUDA_TRY(cudaMemcpyAsync(data_buffer->mutable_data(),
input_view.data<T>(),
data_size_in_bytes,
cudaMemcpyDeviceToHost,
stream));
return data_buffer;
}

The to_arrow function ultimately calls this function and others like it, but never synchronizes before returning. Therefore an application could use this function to get an Arrow array/buffer, and then access that array on the host without synchronizing the stream.

I believe currently pageable host memory is used, which would cause cudaMemcpyAsync to return only once the copy is completed. But if this is switched to pinned host memory, it will be asynchronous, and we will have a race condition.

@harrism harrism added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. labels Nov 9, 2020
@harrism harrism self-assigned this Nov 9, 2020
@harrism
Copy link
Member Author

harrism commented Nov 9, 2020

I believe the same is true of to_dlpack

harrism added a commit that referenced this issue Nov 20, 2020
Converting libcudf to use rmm::cuda_stream_view will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the second PR in the series. This series of PRs will

    Replace usage of cudaStream_t with rmm::cuda_stream_view
    Replace usage of 0 or nullptr as a stream identifier with rmm::cuda_stream_default
    Ensure all APIs always order the stream parameter before the memory resource parameter. #5119

Contributes to #6645 and #5119

Depends on #6646 so this PR will look much bigger until that one is merged.

Also fixes #6706 (to_arrow and to_dlpack are not properly synchronized).

This second PR converts:

    table.hpp (and source / dependencies)
    column_factories.hpp (and source / dependencies)
    headers in include/detail/aggregation (and source / dependencies)
    include/detail/groupby/sort_helper.hpp (and source / dependencies)
    include/detail/utilities/cuda.cuh (and dependencies)
    binary ops
    concatenate
    copy_if
    copy_range
    fill
    gather
    get_value
    hash groupby
    quantiles
    reductions
    repeat
    replace
    reshape
    round
    scatter
    search
    sequence
    sorting
    stream compaction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant