-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness and Readiness Probes #109
Comments
you can do it yourself,
, you will see there is a add following lines to configure livenessprobe.
for
|
@wavezhang Thanks for chiming in. We're working to get liveness probes into an upcoming release. Be careful with the timeout value on the trident-main container! Most operations, including listing backends, are protected by a shared lock in Trident's core. The Kubernetes liveness probe timeout defaults to 1 second which would not be long enough if the REST call is held off by another operation in Trident. For reference, tridentctl uses a default timeout of 90 seconds. |
@clintonk 90 seconds a little long for our application, can this be optimized? |
@wavezhang 90 seconds is a worst case that we only see during heavy stress tests on older hardware. You shouldn't see delays of more than a few seconds during typical operation. But the default of 1 second is definitely too short, since creating a Flexvol or other storage operations can take more than that. You might try something like 15 seconds to start with and watch for any probe-triggered restarts over a few days. Alternatively, if you want something really short, you can use the version API (http://127.0.0.1:8000/trident/v1/version) which isn't gated by the shared lock; that one should always return instantly, but the tradeoff is that it won't detect issues like deadlocks or hangs in the core or the lower storage management layers. |
@clintonk What happens if I restart pod while there running operations? Will everything recover after pod restart? |
@wavezhang Operations like volume creations are wrapped with transactions so they are unwound cleanly if Trident restarts before completion. And Kubernetes is an "eventually consistent" system that continually tries to make its current state consistent with the desired state. Likewise, Trident would just retry any failed operation shortly after restarting. |
To support HA of the Trident pod, liveness and readiness probes should be defined for both the Trident and etcd containers.
The text was updated successfully, but these errors were encountered: