Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpcproxy: Disable fast fail on lease grant call to cluster #8122

Merged
merged 1 commit into from
Jun 19, 2017

Conversation

yudai
Copy link
Contributor

@yudai yudai commented Jun 16, 2017

Hello,

Problem Observed
----------------

When there is no etcd process behind the proxy,
clients repeat resending lease grant requests without delay.
This behavior can cause abnormal resource consumption on CPU/RAM and
network.

Problem Detail
--------------

`LeaseGrant()` uses a bare protobuf client to forward requests.
However, it doesn't use `grpc.FailFast(false)`, which means the method returns
an `Unavailable` error immediately when no etcd process is available.
In clientv3, `Unavailable` errors are not considered the "Halt" error,
and library retries the request without delay.
Both clients and the proxy consume much CPU cycles to process retry requests.

Resolution
----------

Add `grpc.FailFast(false))` to `LeaseGrant()` of the `leaseProxy`.
This makes the proxy not to return immediately when no etcd process is
available. Clients will simply timeout requests instead.

I'm wondering if we should fix the client instead of modifying the proxy.
Currently the retry mechanism in clientv3 doesn't have any delay, so this issue can happen in the future again if somebody implements a proxy without this (undocumented) knowledge. (I actually made the same mistake in my private proxy)

@yudai yudai changed the title Disable fast fail on lease grant call to cluster clientv3: Disable fast fail on lease grant call to cluster Jun 16, 2017
@yudai yudai changed the title clientv3: Disable fast fail on lease grant call to cluster grpcproxy: Disable fast fail on lease grant call to cluster Jun 16, 2017
Problem Observed
----------------

When there is no etcd process behind the proxy,
clients repeat resending lease grant requests without delay.
This behavior can cause abnormal resource consumption on CPU/RAM and
network.

Problem Detail
--------------

`LeaseGrant()` uses a bare protobuf client to forward requests.
However, it doesn't use `grpc.FailFast(false)`, which means the method returns
an `Unavailable` error immediately when no etcd process is available.
In clientv3, `Unavailable` errors are not considered the "Halt" error,
and library retries the request without delay.
Both clients and the proxy consume much CPU cycles to process retry requests.

Resolution
----------

Add `grpc.FailFast(false))` to `LeaseGrant()` of the `leaseProxy`.
This makes the proxy not to return immediately when no etcd process is
available. Clients will simply timeout requests instead.
@heyitsanthony
Copy link
Contributor

The client should probably have some rate limiting on the retry path, but this seems OK to have as well.

@yudai
Copy link
Contributor Author

yudai commented Jun 19, 2017

The client should probably have some rate limiting on the retry path, but this seems OK to have as well.

I'll create a PR for this later when I have time. Because current retry mechanism makes writing third party proxies a little bit harder.

@xiang90
Copy link
Contributor

xiang90 commented Jun 19, 2017

@yudai thanks. merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants