You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Killing multiple MySQL pods at the same time without waiting for them to fully come online will put the cluster in an unhealthy state, but it will not trigger the reboot cluster from complete outage flow to recover the cluster.
This will also happen during a networking outage, which in this case was simulated by taking down all NICs on all the microk8s machines in a single AZ.
Steps to reproduce
By killing pods.
a. Kill multiple mysql pods at the same time.
kubectl delete mysql-0
kubectl delete mysql-1
By simulating a network outage.
a. Take down all NICs of the microk8s machines.
juju machines -m microk8s | awk '/AZ3/ {print $1}'|whileread machine;do
juju ssh $machine' for nic in bond0 bondM; do # adjust as necessary sudo ip link set dev $nic down done'done
b. Wait 15 minutes for the network outage to affect mysql, then reboot all the machines in the AZ to bring the network back online.
Expected behavior
The cluster will go offline and the reboot cluster from complete outage flow will be triggered to recover the cluster.
Actual behavior
The leader unit will not go offline and the reboot cluster flow will not be triggered, leaving the cluster in an inconsistent state.
mysql/0 maintenance idle 10.1.1.10 offline
mysql/1 maintenance idle 10.1.1.11 offline
mysql/2* maintenance idle 10.1.1.12 Unable to get member state
Killing multiple MySQL pods at the same time without waiting for them to fully come online will put the cluster in an unhealthy state, but it will not trigger the reboot cluster from complete outage flow to recover the cluster.
This will also happen during a networking outage, which in this case was simulated by taking down all NICs on all the microk8s machines in a single AZ.
Steps to reproduce
a. Kill multiple mysql pods at the same time.
a. Take down all NICs of the microk8s machines.
b. Wait 15 minutes for the network outage to affect mysql, then reboot all the machines in the AZ to bring the network back online.
Expected behavior
The cluster will go offline and the reboot cluster from complete outage flow will be triggered to recover the cluster.
Actual behavior
The leader unit will not go offline and the reboot cluster flow will not be triggered, leaving the cluster in an inconsistent state.
Versions
Operating system: Ubuntu 22.04 Jammy Jellyfish
Juju CLI: 3.4.3
Juju agent: 3.4.3
Charm revision: 153 (channel 8.0/stable)
microk8s: 1.28
Additional context
To recover the cluster, the
mysqld_safe
Pebble service needs to be restarted inside the leader unit:The text was updated successfully, but these errors were encountered: