Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pods disaster tolerance and data regions disaster tolerance test cases #497

Merged
merged 35 commits into from
May 30, 2019

Conversation

xiaojingchen
Copy link
Contributor

@xiaojingchen xiaojingchen commented May 16, 2019

What problem does this PR solve?

  • Add test case that checks PD/TIKV/TIDB support disaster tolerance
  • Add test case that checks data regions support disaster tolerance
  • stability test support to test the branch which has not merged to master

What is changed and how it works?

Check List

Tests

  • E2E test [done]
  • Stability test [done]
  • Manual test [done]

Code changes

  • Has Helm charts change
  • Has Go code change (package: tests/)
  • Makefile

Side effects

Related changes
base PR: #475
should be merged after #499

Does this PR introduce a user-facing change?:

None

@weekface weekface added the test/stability stability tests label May 21, 2019
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen xiaojingchen marked this pull request as ready for review May 23, 2019 08:04
@xiaojingchen xiaojingchen assigned cofyc and unassigned cofyc May 23, 2019
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

tests/actions.go Outdated Show resolved Hide resolved
@@ -103,6 +102,46 @@ func main() {
BatchSize: 1,
RawSize: 1,
},
SubValues: `pd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better readability, can we move this string literal to a private const value?

@@ -146,6 +185,46 @@ func main() {
BatchSize: 1,
RawSize: 1,
},
SubValues: `pd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this

@@ -100,6 +100,46 @@ func main() {
},
Monitor: true,
BlockWriteConfig: conf.BlockWriter,
SubValues: `pd:
affinity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend abstract the template and render the namespaces field, maintaining similar yaml literal in 4 places is error-prone

tests/dt.go Outdated
for i, node := range nodes.Items {
index := i % RackNum
node.Labels[RackLabel] = fmt.Sprintf("rack%d", index)
oa.kubeCli.CoreV1().Nodes().Update(&node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle the error

tests/dt.go Outdated
return oa.checkDisasterTolerance(tidbs.Items, nodeMap)
}

func (oa *operatorActions) checkDisasterTolerance(allPods []corev1.Pod, nodeMap map[string]corev1.Node) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revive warns about function/method name only differs in capitalization, I thinks we should avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, got it

tests/dt.go Outdated
}

for rack, pods := range rackPods {
if len(pods) > maxPodsOneRack {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also check minimum pods.

Consider 4 pod with 3 rack, max pod is 2, [2, 2, 0] fit the criteria but not the best arrangement to be fault-tolerant. Add the minimum check >=1 will help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! it's a bug

pdClient := http.Client{
Timeout: 10 * time.Second,
}
url := fmt.Sprintf("http://%s-pd.%s:2379/pd/api/v1/regions", cluster.ClusterName, cluster.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this to PdControlInterface and use oa.pdControl.GetPDClient().GetRegions() instead? Which is better for future maintainability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered, mainly at present only the test code to use this interface, and PD 3.0 and later versions of the interface has changed, so temporary implementation.

return err
}

for _, region := range regions.Regions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comment for this? I find this is hard to understand at first glance

@aylei
Copy link
Contributor

aylei commented May 23, 2019

/run-e2e-tests

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

1 similar comment
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

2 similar comments
@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@shuijing198799
Copy link
Contributor

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@xiaojingchen
Copy link
Contributor Author

@weekface @aylei @shuijing198799 @cofyc PTAL

weekface
weekface previously approved these changes May 29, 2019
Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -42,13 +42,14 @@ spec:
serviceAccount: tidb-operator-e2e
containers:
- name: tidb-operator-e2e
image: 127.0.0.1:5000/pingcap/tidb-operator-e2e:latest
image: hub.pingcap.net/chenxiaojing/tidb-operator-e2e:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image name

@xiaojingchen
Copy link
Contributor Author

/run-e2e-test

@shuijing198799
Copy link
Contributor

LGTM

@xiaojingchen xiaojingchen merged commit 83ad7c3 into pingcap:master May 30, 2019
@xiaojingchen xiaojingchen deleted the add-DR-test branch May 30, 2019 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test/stability stability tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants