Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: deprecate the ClusterStatus dependency #87656

Merged
merged 2 commits into from
Feb 23, 2020
Merged

kubeadm: deprecate the ClusterStatus dependency #87656

merged 2 commits into from
Feb 23, 2020

Conversation

ereslibre
Copy link
Contributor

@ereslibre ereslibre commented Jan 29, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
While ClusterStatus will be maintained and uploaded, it won't be
used by the internal kubeadm logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (kubeadm upgrade apply, kubeadm upgrade node), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.

Which issue(s) this PR fixes:
Implements kubernetes/enhancements#1380

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: The ClusterStatus struct present in the kubeadm-config ConfigMap is deprecated and will be removed on a future version. It is going to be maintained by kubeadm until it gets removed. The same information can be found on `etcd` and `kube-apiserver` pod annotations, `kubeadm.kubernetes.io/etcd.advertise-client-urls` and `kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint` respectively.

@kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 29, 2020
@k8s-ci-robot k8s-ci-robot added area/kubeadm approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 29, 2020
@ereslibre
Copy link
Contributor Author

/assign @fabriziopandini @neolit123 @rosti

@neolit123
Copy link
Member

applying hold for review.
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2020
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre !
Overall, I like how this is going.

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster_test.go Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
@ereslibre ereslibre requested a review from rosti January 31, 2020 12:07
@ereslibre
Copy link
Contributor Author

Thanks @rosti for your review!, this is ready for another pass.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2020
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre !
I feel that we need to focus our unit tests on the utility funcs. That way we can make more thorough and easy to follow spec as some problems may be hidden by our broad spec tests.

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
@ereslibre ereslibre changed the title kubeadm: Remove ClusterStatus dependency WIP: kubeadm: Remove ClusterStatus dependency Jan 31, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2020
e, ok := clusterStatus.APIEndpoints[nodeName]
if !ok {
return errors.New("failed to get APIEndpoint information for this node")
func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) {
Copy link
Member

@neolit123 neolit123 Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the ...WithoutRetry here, given there are no other getRawAPIEndpointFromPodAnnotation variants?
i think i saw comments on this topic before.

EDIT: ok i see there is also getRawEtcdEndpointsFromPodAnnotation (plural) and getRawEtcdEndpointsFromPodAnnotationWithoutRetry.
IMO the default should be "without-retry".

i must admit the interface is becoming a bit confusing to me with some many func. variants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way functions were splitted were to ease unit testing for each one:

  • getAPIEndpoint just calls to getAPIEndpointWithBackoff with constants.StaticPodMirroringDefaultRetry: no explicit testing needed here.

  • getAPIEndpointWithBackoff: first tries to retrieve the endpoint with getAPIEndpointFromPodAnnotation, if it's not present, try to use the cluster status with getAPIEndpointFromClusterStatus. Tests ensure that this behavior happens.

  • getAPIEndpointFromPodAnnotation: tries to fetch the api endpoint from pod annotations with a backoff. Tests ensure that this behavior happens. Calls to getAPIEndpointFromPodAnnotationWithoutRetry.

  • getAPIEndpointFromPodAnnotationWithoutRetry: tries to fetch the api endpoint from pod annotations without any kind of retry. Tests ensure that this behavior happens.

  • getAPIEndpointFromClusterStatus: already existed.

Please, note that this will be very simplified when we remove the ClusterStatus. I tried to separate all concerns on this interim while we must fallback to the cluster status if pod annotations are missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are going, but can't we merge getAPIEndpointWithBackoff and getAPIEndpoint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the code base we already have the "WithRetry" naming, but not "WithoutRetry"
it seems a bit odd that we now have a default that always retries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested here: #87656 (comment)

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre !
Looks good to me! Only minor naming nits at this point.
I'll nevertheless hold for a review by @fabriziopandini as he is the KEP author and may spot some detail that I've missed.
/lgtm
/hold

cmd/kubeadm/app/util/config/cluster.go Show resolved Hide resolved
e, ok := clusterStatus.APIEndpoints[nodeName]
if !ok {
return errors.New("failed to get APIEndpoint information for this node")
func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are going, but can't we merge getAPIEndpointWithBackoff and getAPIEndpoint?

@@ -127,6 +122,95 @@ func NewFromCluster(client clientset.Interface, certificatesDir string) (*Client
return etcdClient, nil
}

// getEtcdEndpoints returns the list of etcd endpoints.
func getEtcdEndpoints(client clientset.Interface) ([]string, error) {
return getEtcdEndpointsWithBackoff(client, constants.StaticPodMirroringDefaultRetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can we merge getEtcdEndpoints and getEtcdEndpointsWithBackoff?

Copy link
Contributor Author

@ereslibre ereslibre Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for these functions is that getEtcdEndpoints and getAPIEndpoint don't need testing (they have no logic). We test their WithBackoff counterparts, where we can control the backoff on the unit tests, so we have faster unit test execution and controlled with the backoff required depending on the test cases we are stubbing.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2020
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre this is turning out well.
The only nit from my side is about avoiding to test the ExponentialBackoff behavior

cmd/kubeadm/app/util/config/cluster_test.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd_test.go Outdated Show resolved Hide resolved
@neolit123
Copy link
Member

/retitle kubeadm: deprecate the ClusterStatus dependency

@k8s-ci-robot k8s-ci-robot changed the title kubeadm: Remove ClusterStatus dependency kubeadm: deprecate the ClusterStatus dependency Feb 13, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2020
@ereslibre
Copy link
Contributor Author

This should be ready now @kubernetes/sig-cluster-lifecycle-pr-reviews.

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2020
@ereslibre
Copy link
Contributor Author

/retest

@neolit123
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 20, 2020
@neolit123
Copy link
Member

@ereslibre bazel needs a update:

pull-kubernetes-verify — Job failed.

While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.

This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:

- etcd annotations do not contain etcd endpoints, but the overall list
  of etcd pods is greater than 0. This means that we listed at least
  one etcd pod, but they are missing the annotation.

- API server annotation is not found on the api server pod for a given
  node name, but no errors aside from that one were found. This means
  that the API server pod is present, but is missing the annotation.

In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2020
@ereslibre
Copy link
Contributor Author

bazel needs a update:

Ouch, updated and took the opportunity to rebase on top of latest master.

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre for addressing all the comments!
Let's keep an eye on the test grid now!
/hold cancel
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 31b8c0d into kubernetes:master Feb 23, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 23, 2020
@ereslibre ereslibre deleted the do-not-depend-on-cluster-status branch February 25, 2020 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants