Liveness and Readiness Probes #109

acsulli · 2018-04-12T14:44:24Z

To support HA of the Trident pod, liveness and readiness probes should be defined for both the Trident and etcd containers.

wavezhang · 2018-05-23T07:37:39Z

you can do it yourself,
[ for version 18.04.0, and need some changes for other version ]
use the following command to generate yaml files used:

tridentctl --generate-custom-yaml

, you will see there is a trident-deployment.yaml under setupdir

add following lines to configure livenessprobe.
for the trident-main contianer:

+        livenessProbe:
+          failureThreshold: 3
+          exec:
+            command:
+            - curl
+            - 127.0.0.1:8000/trident/v1/backend
+          initialDelaySeconds: 15
+          timeoutSeconds: 10
+          periodSeconds: 3

for etcd container:

+        livenessProbe:
+          failureThreshold: 3
+          exec:
+            command:
+            - etcdctl
+            - -endpoint=http://127.0.0.1:8001/ 
+            - cluster-health
+          initialDelaySeconds: 15
+          timeoutSeconds: 3

clintonk · 2018-05-23T19:57:10Z

@wavezhang Thanks for chiming in. We're working to get liveness probes into an upcoming release. Be careful with the timeout value on the trident-main container! Most operations, including listing backends, are protected by a shared lock in Trident's core. The Kubernetes liveness probe timeout defaults to 1 second which would not be long enough if the REST call is held off by another operation in Trident. For reference, tridentctl uses a default timeout of 90 seconds.

wavezhang · 2018-05-24T07:08:38Z

@clintonk 90 seconds a little long for our application, can this be optimized?

clintonk · 2018-05-24T14:41:15Z

@wavezhang 90 seconds is a worst case that we only see during heavy stress tests on older hardware. You shouldn't see delays of more than a few seconds during typical operation. But the default of 1 second is definitely too short, since creating a Flexvol or other storage operations can take more than that. You might try something like 15 seconds to start with and watch for any probe-triggered restarts over a few days. Alternatively, if you want something really short, you can use the version API (http://127.0.0.1:8000/trident/v1/version) which isn't gated by the shared lock; that one should always return instantly, but the tradeoff is that it won't detect issues like deadlocks or hangs in the core or the lower storage management layers.

wavezhang · 2018-05-25T01:47:07Z

@clintonk What happens if I restart pod while there running operations? Will everything recover after pod restart?

clintonk · 2018-05-25T04:20:06Z

@wavezhang Operations like volume creations are wrapped with transactions so they are unwound cleanly if Trident restarts before completion. And Kubernetes is an "eventually consistent" system that continually tries to make its current state consistent with the desired state. Likewise, Trident would just retry any failed operation shortly after restarting.

Closes: #264

adkerr added enhancement tracked kubernetes labels Apr 18, 2018

wavezhang mentioned this issue May 23, 2018

add livenessprobe #128

Closed

netapp-ci closed this as completed in ccc1f9c Jun 14, 2018

netapp-ci pushed a commit that referenced this issue Aug 26, 2019

Restrict Trident pods to amd64 linux nodes (#109)

31a25a3

Closes: #264

netapp-ci pushed a commit that referenced this issue Sep 12, 2019

Restrict Trident pods to amd64 linux nodes (#109)

c5ee929

Closes: #264

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness and Readiness Probes #109

Liveness and Readiness Probes #109

acsulli commented Apr 12, 2018

wavezhang commented May 23, 2018 •

edited

Loading

clintonk commented May 23, 2018

wavezhang commented May 24, 2018

clintonk commented May 24, 2018

wavezhang commented May 25, 2018

clintonk commented May 25, 2018

Liveness and Readiness Probes #109

Liveness and Readiness Probes #109

Comments

acsulli commented Apr 12, 2018

wavezhang commented May 23, 2018 • edited Loading

clintonk commented May 23, 2018

wavezhang commented May 24, 2018

clintonk commented May 24, 2018

wavezhang commented May 25, 2018

clintonk commented May 25, 2018

wavezhang commented May 23, 2018 •

edited

Loading