Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD needs stop uncontrollable expandsion #1895

Closed
benmaoer opened this issue Mar 10, 2020 · 4 comments
Closed

PD needs stop uncontrollable expandsion #1895

benmaoer opened this issue Mar 10, 2020 · 4 comments
Assignees
Labels
Milestone

Comments

@benmaoer
Copy link

Bug Report

What version of Kubernetes are you using?
v1.16

What version of TiDB Operator are you using?

/ # /usr/local/bin/tidb-scheduler --version
TiDB Operator Version: version.Info{GitVersion:"v1.0.6", GitCommit:"982720cd563ece6dbebfc4c579b17fa66a93c550", GitTreeState:"clean", BuildDate:"2019-12-27T16:53:44Z", GoVersion:"go1.13", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
not about pvc

What's the status of the TiDB cluster pods?
349921A8-D67A-40DF-9D2F-BDBC005379B3

What did you do?
When the cluster was initialized, the PD had 3 pods, demo-pd-{0,1,2}. After demo-pd-1 failed and was unrecoverable, one by one, the PD cluster expanded to 7 nodes.

What did you expect to see?
Expand 1 PD pod to replace crash one.

What did you see instead?
PD cluster expanded to 7 nodes.

@DanielZhangQD
Copy link
Contributor

@benmaoer Thanks for reporting this issue!
When one PD Pod is down, after 5 minutes by default, tidb-operator will create a new pod to mitigate the failure.
We have already limited the number of Pods triggered by failover for tidb and tikv, we will add the limit for PD in future release. cc @weekface

@weekface
Copy link
Contributor

Expand 1 PD pod to replace crash one.

The failover feature should be refactor with AdvancedStatefulSet to replace failed pod in place. This feature may be targeted at v1.2.x

@DanielZhangQD
Copy link
Contributor

This issue will be fixed in v1.1.0-rc.2 with PR #2191.

@DanielZhangQD DanielZhangQD modified the milestones: v1.1.0, v1.0.7 Apr 15, 2020
@cofyc cofyc added the status/WIP Issue/PR is being worked on label Jun 8, 2020
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants