Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes on the gRPC frontend to handle AsyncNotifyWhenDone() API #6345

Merged
merged 9 commits into from
Oct 3, 2023
42 changes: 42 additions & 0 deletions src/grpc/grpc_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,48 @@

namespace triton { namespace server { namespace grpc {

std::ostream&
operator<<(std::ostream& out, const Steps& step)
{
switch (step) {
case START:
out << "START";
break;
case COMPLETE:
out << "COMPLETE";
break;
case FINISH:
out << "FINISH";
break;
case ISSUED:
out << "ISSUED";
break;
case READ:
out << "READ";
break;
case WRITEREADY:
out << "WRITEREADY";
break;
case WRITTEN:
out << "WRITTEN";
break;
case WAITING_NOTIFICATION:
out << "WAITING_NOTIFICATION";
break;
case CANCELLATION_ISSUED:
out << "CANCELLATION_ISSUED";
break;
case CANCELLED:
out << "CANCELLED";
break;
case PARTIAL_COMPLETION:
out << "PARTIAL_COMPLETION";
break;
}

return out;
}

void
GrpcStatusUtil::Create(::grpc::Status* status, TRITONSERVER_Error* err)
{
Expand Down
41 changes: 41 additions & 0 deletions src/grpc/grpc_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,47 @@

namespace triton { namespace server { namespace grpc {

// The step of processing that the state is in. Every state must
// recognize START, COMPLETE and FINISH and the others are optional.
typedef enum {
// This marks the starting stage of the RPC
START,
// This marks that RPC is complete.
COMPLETE,
// This marks the stage where all the notifications from the gRPC
// completion queue is received and state can be safely released.
FINISH,
// This stage means that RPC has been issued to Triton for inference
// and is waiting for the server callbacks or cancellation to be
// invoked.
ISSUED,
// This stage means the request has been read from the network and
// can be sent to Triton for execution.
READ,
// This stage means that the response is ready to be written back to
// the network.
WRITEREADY,
rmccorm4 marked this conversation as resolved.
Show resolved Hide resolved
// This stage means that response has been written completely to the
// network.
WRITTEN,
// This marks the special stage for the state object to differentiate
// the tag delivered from AsyncNotifyWhenDone() method.
WAITING_NOTIFICATION,
// This stage means that the cancellation for the RPC has been issued
// to the server.
CANCELLATION_ISSUED,
// This stage marks that the state has been successfully cancelled.
CANCELLED,
// This is intermediary stage where the state has been been partially
// completed by grpc responder Finish call or AsyncNotifyWhenDone()
// notification. The other next call will move the stage to fully
// complete.
Comment on lines +74 to +75
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From talking offline, is there an edge case here where the state has already been processed and gets stuck in PARTIAL_COMPLETION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about this more. As per my reading and experimentation, a race condition exists in our current implementation as well.
Specifically, we don't know at this point here: https://github.com/triton-inference-server/server/blob/main/src/grpc/stream_infer_handler.cc#L183-L195
Whether the stream was closed from the client side or the server is in shutdown. By introducing custom sleep, I was able to cause a gRPC assert error because it tries to call Finish() on a compeletion queue that is in shutdown state.

So, it is not getting stuck, but it might cause grpc assert when closing the server while running heavy streaming inference workload.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By introducing custom sleep, I was able to cause a gRPC assert error because it tries to call Finish() on a compeletion queue that is in shutdown state.

So, it is not getting stuck, but it might cause grpc assert when closing the server while running heavy streaming inference workload.

Is this a graceful failure? or do we need to catch this somehow?

Copy link
Contributor Author

@tanmayv25 tanmayv25 Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_error is not a graceful error. I think a better way would be to prime the handlers for cancellation before shutting down the completion queue. I will address this issue in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, separate PR is fine. Add a ticket if needed for tracking.

PARTIAL_COMPLETION
} Steps;

// Debugging helper
std::ostream& operator<<(std::ostream& out, const Steps& step);

//
// GrpcStatusUtil
//
Expand Down
37 changes: 1 addition & 36 deletions src/grpc/infer_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,42 +37,6 @@ NextUniqueId()

namespace triton { namespace server { namespace grpc {

std::ostream&
operator<<(std::ostream& out, const Steps& step)
{
switch (step) {
case START:
out << "START";
break;
case COMPLETE:
out << "COMPLETE";
break;
case FINISH:
out << "FINISH";
break;
case ISSUED:
out << "ISSUED";
break;
case READ:
out << "READ";
break;
case WRITEREADY:
out << "WRITEREADY";
break;
case WRITTEN:
out << "WRITTEN";
break;
case CANCELLATION_ISSUED:
out << "CANCELLATION_ISSUED";
break;
case CANCELLED:
out << "CANCELLED";
break;
}

return out;
}

TRITONSERVER_Error*
OutputBufferAttributesHelper(
TRITONSERVER_ResponseAllocator* allocator, const char* tensor_name,
Expand Down Expand Up @@ -785,6 +749,7 @@ ModelInferHandler::Process(InferHandler::State* state, bool rpc_ok)
#endif // TRITON_ENABLE_TRACING

state->step_ = Steps::FINISH;
} else if (state->step_ == Steps::FINISH) {
finished = true;
tanmayv25 marked this conversation as resolved.
Show resolved Hide resolved
}

Expand Down
Loading
Loading