-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-0.39 into branch-0.40 #253
Merged
AyodeAwe
merged 5 commits into
rapidsai:branch-0.40
from
jameslamb:branch-0.40-merge-0.39
Jul 25, 2024
Merged
Merge branch-0.39 into branch-0.40 #253
AyodeAwe
merged 5 commits into
rapidsai:branch-0.40
from
jameslamb:branch-0.40-merge-0.39
Jul 25, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) The missing support for specifying memory type to `ucxx::MemoryHandle` prevents usage of non-host memory allocations. With this change it is now possible to create `ucxx::MemoryHandle` with CUDA memory allocation (UCX-managed), as well as pointing the handle to existing CUDA memory allocations. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - James Lamb (https://github.com/jameslamb) - Mads R. B. Kristensen (https://github.com/madsbk) URL: rapidsai#246
Contributes to rapidsai/build-planning#31 In short, RAPIDS DLFW builds want to produce wheels with unsuffixed dependencies, e.g. `cudf` depending on `rmm`, not `rmm-cu12`. This PR is part of a series across all of RAPIDS to try to support that type of build by setting up CUDA-suffixed and CUDA-unsuffixed dependency lists in `dependencies.yaml`. For more details, see: * rapidsai/build-planning#31 (comment) * rapidsai/cudf#16183 ## Notes for Reviewers ### Why target 24.08? This is targeting 24.08 because: 1. it should be very low-risk 2. getting these changes into 24.08 prevents the need to carry around patches for every library in DLFW builds using RAPIDS 24.08 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#251
…i#238) # Add several fixes primarily targeting worker progress thread Several changes are added to improve stability of worker progress thread. Details of changes are below. ## Support generic callback cancelation Once registered it was not possible to cancel generic callbacks, which could cause the callback to be called after the timeout had expired and the owner/resources were not alive anymore. This change allows canceling a generic callback that couldn't be completed successfully, avoiding access to invalid memory of objects that are not alive anymore. ## Remove `ErrorCallbackData` The `ErrorCallbackData` structure and `ucxx::Endpoint::errorCallback` static method used by `ucxx::Endpoint` were removed in favor of using a friend function `endpointErrorCallback` that receives a pointer to `ucxx::Endpoint` itself, thus simplifying ownership of all resources where only `ucxx::Endpoint` now is responsible to retain ownership of all resources. ## Improvements to `ucxx::InflightRequests` ### Add missing locks All accesses to any of internal `_trackedRequests` resources are now protected by locks. This has been the source of various issues in the past, thus switching to a safer approach seems wise. During `merge()` both merger and mergee have resource locks now to prevent races. ### Use regular instances instead of pointer for `TrackedRequests` To simplify `ucxx::InflightRequests`/`TrackedRequests` instead of using pointers to `InflightRequestsMap` objects, switch to using simple instances. ## Improvements to `ucxx::WorkerProgressThread` ### Replace `std::ref` with `std::shared_ptr` Using `std::ref` was causing undefined behavior in the state of `_stop`, switching to `std::shared_ptr` has resolved the problem. ### Set progress thread ID before progress loop stats Set the thread ID with parent before starting the progress loop, so that checking whether the a callback is running in the progress thread is always correct, otherwise the parent may be unable to determine the progress thread has already started. ### Use simple instance instead of pointer New members, such as `isRunning()` and `stop()`, were introduced so a simple instance of `WorkerProgressThread` may be used instead of smart pointers. With this change the owner of the instance may query whether the thread indeed stopped, as opposed to just letting it go out-of-scope while the thread was not joined. ## Improvement to `CallbackNotifier` Instead of blocking continuously for longer periods of time, it's now possible to periodically signal the worker again to cause it to wakeup (when in blocking mode). Currently defaults to 100ms. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#238
…pidsai#250) Add option to allow enabling/disabling endpoint error handling in C++ benchmark, switching the default to disable matching `ucx_perftest`. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#250
jameslamb
added
improvement
Improves an existing functionality
non-breaking
Introduces a non-breaking change
labels
Jul 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Getting the changes from recent PRs targeting branch-0.39 onto branch-0.40
Followed the procedure described at https://docs.rapids.ai/maintainers/forward-merger
Notes for Reviewers
This should be non-squash admin-merged.