Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving client call options flexibility and replace retry interceptor #16216

Open
CaojiamingAlan opened this issue Jul 11, 2023 · 3 comments
Open

Comments

@CaojiamingAlan
Copy link
Contributor

What would you like to be added?

1. Expose more client call options

Most of the gRPC call options of clientV3 cannot be configured by users (while some of them can, like MaxCallSendMsgSize) , and results in issues like

We should provide users with more flexibility to make RPC calls.

2. Replace the customized retry interceptor with RetryPolicy

Old gRPC versions do not have retry options, and etcd client implemented retry through the retry interceptor. #12671 had replaced the balancer logic with the gRPC solution, and left a TODO here. Maybe we can finish this replacement now.

3. (Maybe) Set defaultWaitForReady to True

I'm not sure about this. This is motivated by #14631. Actually, by my experiments, not only NewSession() but common usages like Get() may hang: whenever the call is directed to the SIGSTOP-ed server (this always happens since it uses round-robin), it hangs indefinitely waiting for the server to be ready. Neither did I find anywhere configures a default timeout.

It seems the motivation to set WaitForReady to True is for

  1. preventing retry requests from overwhelming the network grpcproxy: Disable fast fail on lease grant call to cluster #8122
  2. minimizing client request error responses due to transient failures

(FYI WaitForReady is the opposite of FailFast, with the latter one being deprecated)

But if once we have the retry and WaitForReady configs exposed, we can change the default behavior to fail fast, and users may turn it on manually, or set a proper retry limit.

I do see caveats of clientV2 but I think we may try to solve it in V3.

Why is this needed?

To provide more flexibility and reduce code volume

@CaojiamingAlan CaojiamingAlan changed the title Improving client Improving client call options flexibility and replace retry interceptor Jul 11, 2023
@dusk125
Copy link
Contributor

dusk125 commented Jan 11, 2024

+1 for this. Something we see in OpenShift is that the API server can fail to retry on slower platforms (non-default LEADER_ELECTION_TIMEOUT & HEARTBEAT_INTERVAL) with an "etcdserver: leader changed" error. The server tries to set the retries to avoid this, but has them overwritten by the etcd retry.

Because of this, I would be happy to take on this work (specifically number 2 if the work needs to be broken up).

@serathius
Copy link
Member

cc @ahrtr @serathius

@ahrtr
Copy link
Member

ahrtr commented Jan 11, 2024

There are 3 different feature requests, let's breakdown them into three different tickets.

For 1, I vaguely recall that someone raised a PR to resolve it about 1.5 years ago, and ptabor proposed a generic solution to let users configure all gRPC options. But I couldn't find the PR anymore. Anyway, it seems a valid feature.

For 2, Note both grpc/resolver and grpc/balancer are still experimental features/packages. We should try not to make any related change. But It's open to investigate and discuss.

There is a related issue #15145. But it should be fine because etcd's usege of the grpc-go's resolver package is really very simple and basic cc @dfawley

For 3, please add the context into #14631, and let's continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants