You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After restarting from a crash, the operator can mistakenly delete the secret objects if it reads stale state of cluster.Spec.TLS.Enabled.
Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field cluster.Spec.TLS.Enabled is initially set to false, and then changed to true by the user. The operator reconciles and creates the Secret object accordingly. After the Secret object is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds the cluster.Spec.TLS.Enabled field as false at the moment. The operator cannot tell whether the data is stale or not so it directly deletes the Secret object.
To Reproduce
Steps to reproduce the behavior:
Create YBCluster with cluster.Spec.TLS.Enabled set to false.
Change cluster.Spec.TLS.Enabled to true. Operator will reconcile and create the Secret objects. Meanwhile, apiserver2 is straggling and still holds cluster.Spec.TLS.Enabled as false.
Operator crashes, restarts, and communicates with apiserver2. It then reconciles and deletes the Secret objects since cluster.Spec.TLS.Enabled is false on apiserver2.
Fix
We are willing to send a PR to fix this problem.
A potential fix is to use the Secret object's UID on deletion (precondition). If the Secret object is stale, etcd will tell that the UID is invalid and prevent the deletion.
The text was updated successfully, but these errors were encountered:
Describe the bug
After restarting from a crash, the operator can mistakenly delete the secret objects if it reads stale state of
cluster.Spec.TLS.Enabled
.Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field
cluster.Spec.TLS.Enabled
is initially set to false, and then changed to true by the user. The operator reconciles and creates the Secret object accordingly. After the Secret object is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds thecluster.Spec.TLS.Enabled
field as false at the moment. The operator cannot tell whether the data is stale or not so it directly deletes the Secret object.To Reproduce
Steps to reproduce the behavior:
cluster.Spec.TLS.Enabled
set to false.cluster.Spec.TLS.Enabled
to true. Operator will reconcile and create the Secret objects. Meanwhile, apiserver2 is straggling and still holdscluster.Spec.TLS.Enabled
as false.cluster.Spec.TLS.Enabled
is false on apiserver2.Fix
We are willing to send a PR to fix this problem.
A potential fix is to use the Secret object's UID on deletion (
precondition
). If the Secret object is stale, etcd will tell that the UID is invalid and prevent the deletion.The text was updated successfully, but these errors were encountered: