-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/RNDV/CUDA: RNDV protocol improvements for CUDA #5473
Conversation
Commenter does not have sufficient privileges for PR 5473 in repo openucx/ucx |
bot:pipe:retest |
bot:pipe:retest |
1 similar comment
bot:pipe:retest |
src/ucp/core/ucp_request.h
Outdated
ucp_rkey_h rkey; /* key for remote send buffer */ | ||
ucp_lane_map_t lanes_map_avail; /* used lanes map */ | ||
ucp_lane_map_t lanes_map_all; /* actual lanes map */ | ||
uint8_t lanes_count; /* actual lanes map */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lanes count
(in the comment)
src/ucp/proto/rndv.h
Outdated
@@ -66,12 +66,14 @@ void ucp_rndv_receive(ucp_worker_h worker, ucp_request_t *rreq, | |||
const ucp_rndv_rts_hdr_t *rndv_rts_hdr); | |||
|
|||
static UCS_F_ALWAYS_INLINE int ucp_rndv_is_get_zcopy(ucs_memory_type_t mem_type, | |||
size_t length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass just 2 params to this func: request and context?
2 first params are taken from request, others are from context
src/ucp/proto/rndv.c
Outdated
freq->send.rndv_get.rreq = sreq; | ||
ucp_rndv_req_init_get_zcopy_lane_map(freq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you really need to re-calculate it for every fragment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so because each fragment is tracked in separate rndv frag req
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we cache it somewhere, because it is supposed to be the same for all fragments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. can you check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brminich can you please check?
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoopoepg, plz review
ucp_request_recv_buffer_reg(rreq, ep_config->key.rma_bw_md_map, | ||
rndv_rts_hdr->size); | ||
|
||
if ((rndv_mode == UCP_RNDV_MODE_PUT_ZCOPY) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like now we do not register RX buffer for PUT protocol in case of UCP_RNDV_MODE_PUT_AUTO
+ non-CUDA memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This PR is not changing current behavior for the HOST memory. Today, it does GET by default for HOST memory. it sends RTR w/o registering RX buffer to switch to Active message rndv it fails to do GET (https://github.com/openucx/ucx/blob/master/src/ucp/proto/rndv.c#L473)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brminich are we ok here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@Akshay-Venkatesh @yosefe can you please review? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/ucp/proto/rndv.c
Outdated
|
||
if ((lane_bw/max_lane_bw) < | ||
(1. / context->config.ext.multi_lane_max_ratio)) { | ||
lane_map &= ~UCS_BIT(lane_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems it can make rndv_req->send.rndv_get.rkey_index[i] invalid because some lanes would be removed from the middle
bot:pipe:retest |
What
Improving out-of-the-box behavior for cuda transfers
Why ?
The current default rndv scheme for CUDA transfers is to use sender-side pipelining to overcome GPUDirectRDMA performance limitation on architecture where GPUs are connected to CPU socket directly.
But, now most of the GPU system architectures are designed where GPU and NIC are connected to same PCIe Switch ( Ex: all DGX systems, different vendor GPU server architectures). we are recommending users set
UCX_RNDV_SCHEME=get_zcopy
explicitly inorder to get an optimal GPUDirectRDMA performance on these architectures. And also we do not have an optimal fallback if GET protocol fails (ex GPUs connected to different CPU sockets(DGX1))How ?
@Akshay-Venkatesh @yosefe