Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix service update which would changed the nodeport #2284

Merged
merged 14 commits into from
May 7, 2020

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented Apr 24, 2020

What problem does this PR solve?

currently if we make tidb service type as nodeport, it would change the nodeport in every reconcilation. This pr fix this bug.

What is changed and how does it work?

Check List

Tests

  • E2E test

Related changes

  • Need to cherry-pick to the release branch

Does this PR introduce a user-facing change?:

Fix the bug when `TidbCluster` service type is `NodePort`, the nodeport would change frequently.

}
}
}
}
Copy link
Contributor

@cofyc cofyc Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the node port may change? because we empty the NodePort field each time? what's your solution to fix this?

UpdateService is a generic function to update service and resolve conflicts if necessary. Trying to implement this logic in this function is strange. I think we should generate the correct desired service spec before calling the update API. And generating the correct desired service can help avoid unnecessary API updates (skip if the desired spec matches existing spec).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds make sense, updated.

framework.Failf("nodePort tidbcluster tidb service type isn't NodePort")
}
ports := s.Spec.Ports
// check node port unchanged for 5mins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update some attributes of the service and then check no change for nodeport?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

DanielZhangQD
DanielZhangQD previously approved these changes Apr 26, 2020
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DanielZhangQD DanielZhangQD self-requested a review April 26, 2020 03:30
@DanielZhangQD
Copy link
Contributor

@Yisaer please also check why the nodePort changed even if there is no update to the service configuration, maybe some checks are not correct which causes the service to be updated during each sync.

@Yisaer
Copy link
Contributor Author

Yisaer commented Apr 26, 2020

o check why the nodePort changed even if there is no update to the service configuration, maybe some checks are not correct which causes the service to b

I think the previous disscusion have already answered the question. The nodeport changed because we left nodeport empty during each syncing.

@Yisaer Yisaer requested a review from cofyc April 26, 2020 04:18
}
return false, nil
})
framework.ExpectError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.ExpectEqual(err, wait.ErrWaitTimeout)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you can return these errors in this wait function. this will make this function shorter and errors will be printed if the test fails because what you expect is Timeout error.

err = wait.Poll(5*time.Second, 5*time.Minute, func() (done bool, err error) {
s, err := c.CoreV1().Services(ns).Get("nodeport-tidb", metav1.GetOptions{})
if err != nil {
klog.Errorf(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you can return these errors in this wait function. this will make this function shorter and errors will be printed if the test fails because what you expect is Timeout error.

err = wait.Poll(5*time.Second, 5*time.Minute, func() (done bool, err error) {
s, err = c.CoreV1().Services(ns).Get("nodeport-tidb", metav1.GetOptions{})
if err != nil {
klog.Errorf(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use framework.Logf which writes to ginkgo writer and the logs can be printed into ginkgo test report if failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

ports := s.Spec.Ports

// check node port unchanged for 5mins
err = wait.Poll(5*time.Second, 5*time.Minute, func() (done bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 minute IMO is enough because our reconciliation period is 30 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

"foo": "bar",
}
_, err = cli.PingcapV1alpha1().TidbClusters(ns).Update(tc)
framework.ExpectNoError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use previous nodeTC and controller.GuaranteedUpdate, e.g.

        err = controller.GuaranteedUpdate(genericCli, nodeTc, func() error {
            nodeTc.Spec.TiDB.Service.Annotations = map[string]string{
			    "foo": "bar",
    	    }  
            return nil
        }) 

controller.GuaranteedUpdate( can help resolve conflicts (status of TidbCluster objects are updated by controller frequently) and make our test stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

}
return true, nil
})
framework.ExpectNoError(err)
Copy link
Contributor

@cofyc cofyc Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should wait for Timeout error to happen (NodePort not changed after the update in a few minutes). IMO you can reuse previous wait function. Testing the update of service succeeds or not is out of the scope of this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@DanielZhangQD
Copy link
Contributor

o check why the nodePort changed even if there is no update to the service configuration, maybe some checks are not correct which causes the service to b

I think the previous disscusion have already answered the question. The nodeport changed because we left nodeport empty during each syncing.

I mean what why the service is updated even if no change to the service config in the tc, I have checked this and found that ClusterIP is included in the ServiceLastAppliedConfigAnnotation, but there is no ClusterIP in the new service spec, so the quality check always returns false here, please fix this issue as it's the cause of it would change the nodeport in every reconcilation in the issue description.

@Yisaer
Copy link
Contributor Author

Yisaer commented Apr 26, 2020

o check why the nodePort changed even if there is no update to the service configuration, maybe some checks are not correct which causes the service to b

I think the previous disscusion have already answered the question. The nodeport changed because we left nodeport empty during each syncing.

I mean what why the service is updated even if no change to the service config in the tc, I have checked this and found that ClusterIP is included in the ServiceLastAppliedConfigAnnotation, but there is no ClusterIP in the new service spec, so the quality check always returns false here, please fix this issue as it's the cause of it would change the nodeport in every reconcilation in the issue description.

I suggest to create a new issue to solve this problem.

@Yisaer
Copy link
Contributor Author

Yisaer commented Apr 26, 2020

/run-e2e-in-kind

Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yisaer Yisaer merged commit ac2b137 into pingcap:master May 7, 2020
sre-bot pushed a commit to sre-bot/tidb-operator that referenced this pull request May 7, 2020
@sre-bot
Copy link
Contributor

sre-bot commented May 7, 2020

cherry pick to release-1.1 in PR #2390

Yisaer added a commit that referenced this pull request May 8, 2020
Co-authored-by: Song Gao <disxiaofei@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-1.1 type/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants