Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm upload and fetch of kubeam config v1alpha3 #67944

Merged

Conversation

fabriziopandini
Copy link
Member

What this PR does / why we need it:
This PR implements upload and fetch of kubeam config v1alpha3 from cluster.

More in detail:
In upload, kubeadm-config gets

  • ClusterConfiguration (without components config which are already stored in separated ConfigMaps)
  • ClusterStatus(initialised or updated with the API endpoint of the current node)

During fetch InitConfiguration is composed with:

  • ClusterConfiguration from kubeadm-config
  • The APIEndpoint of the current node from ClusterStatus in kubeadm-config
  • Component configs from corresponding ConfigMaps

Which issue(s) this PR fixes :
refs kubernetes/kubeadm#911, refs kubernetes/kubeadm#963

Special notes for your reviewer:
In order to implement this it was necessary to extend current component config management with a new GetFromConfigMap operation. This is implemented in a separated commit "
implement component configs GetFromConfigMap".
The real change build on this (commi "upload and fetch kubeadm v1alpha3")

Release note:

NONE

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/sig cluster-lifecycle
/area kubeadm
/kind enhancement
/assign @luxas
/assign @timothysc
/cc @chuckha @rosti @neolit123 @liztio

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 28, 2018
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/kubeadm labels Aug 28, 2018
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 28, 2018
@fabriziopandini fabriziopandini changed the title Kubeadm config config map Kubeadm upload and fetch of kubeam config v1alpha3 Aug 28, 2018
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @fabriziopandini
added minor comments but also commented on the usage of ChooseBindAddress that we talked about last time. we might need something more unique here, unless i'm mistaken.


apiEndpoint, ok := status.APIEndpoints[machineID]
if !ok {
return nil, fmt.Errorf("failed to retry APIEndpoint information for this node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to retry -> to get?

func GetMachineID() (string, error) {
// the IP of the host default network interface as a machine unique id
ip, err := netutil.ChooseBindAddress(nil)
if err != nil {
Copy link
Member

@neolit123 neolit123 Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we have the following scenario:

  • machine A is a master node and ChooseBindAddress returned 192.168.0.1 for it.
  • machine B in a different network want to join the cluster but ChooseBindAddress also returns 192.168.0.1 for it.
  • both nodes have the same ID/local IP, even if technically for node B to join node A it needs it's external IP.

hostnames suffer from the same problem.
i think we might have to do UUIDs, but this needs platform specific code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 how frequent is this case?
However, If there is a library already vendored in kubernetes for getting UUIDs I can try to switch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's a general issue when nodes are not on the same sub-network?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generating some UUID is pretty useful when folks are changing nodes in a virutual infra will be useful.

Copy link
Member

@neolit123 neolit123 Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't find a vendored "system" uuid lib in k8s - i.e. a library for returning the uuid of the machine.

there is this, but this library is just for generating a RFC compliant random uuid:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/uuid/uuid.go

we need something like:
https://github.com/denisbrodbeck/machineid
the methods that the library uses are outlined here:
https://github.com/denisbrodbeck/machineid#snippets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of UUIDs here. IP change can cause issues.

On the other hand, the machineid library queries OS maintained IDs. These are usually generated upon first boot, install or regenerated if missing on boot. They can be the same if a VM is created from a template and these are not taken care of correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't find MAC address handling in the k/k codebase.

closed PR that introduced machine IDs for host names:
#52625

closed issue about node uniqueness:
#6557

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini we talked about this at the kubeadm office hours today.

Jason reminded us that the node name should be unique for a cluster to work anyway so we can use that as the key for the status map:

from @detiber

More specifically, this should be "node name" being either the user-supplied override in the kubeadm config or the default detected node name (generally the hostname here)

@@ -0,0 +1,107 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018

// Create temp folder for the test case
tmpdir, err := ioutil.TempDir("", "")
if err != nil {
t.Fatalf("Couldn't create tmpdir")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't -> couldn't

}

if obj == nil {
t.Errorf("Unexpected nil return value")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected -> unexpected

k8sVersion := version.MustParseGeneric("v1.12.0")
obj, err := r.GetFromConfigMap(client, k8sVersion)
if err != nil {
t.Errorf("Error getting ConfigMap: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error getting -> error getting


err := rt.configMap.create(client)
if err != nil {
t.Errorf("Couldn't create ConfigMap %s", rt.configMap.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't -> couldn't

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini Thanks for the effort! Has a few minor nits, but otherwise it LGTM.

return
}
if err == nil && !reflect.DeepEqual(apiEndpoint, actual) {
t.Errorf("actual doesn't match expected apiEndpoint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps print actual and expected here?

configMapName := kubeadmconstants.GetKubeletConfigMapName(version)
kubeletCfg, err := client.CoreV1().ConfigMaps(metav1.NamespaceSystem).Get(configMapName, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
return nil, fmt.Errorf("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual error message is missing here

)

// FetchConfigFromFileOrCluster fetches configuration required for upgrading your cluster from a file (which has precedence) or a ConfigMap in the cluster
// TODO: This func should be renamed FetchClusterConfigFromFileOrCluster, and return a ClusterConfiguration instead of an InitConfiguration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to keep an updated version of this TODO for the future.
Ideally we should not be forced to have InitConfiguration for anything else than kubeadm init. I am not sure, that ClusterConfiguration is the answer too (since APIEndpoint is not part of it).
It may be something else, but we may just keep some TODO here, just to mark our intent in reducing the InitConfiguration usages in the long run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I removed the comment about ClusterConfiguration because we know this isn't the right approach.

For the long run I agree 100% with your intent of reducing the InitConfiguration usages, but better track this with an issue than TODOs in the code.
If you can kindly open the issue this will be great 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc timothysc added this to the v1.12 milestone Aug 28, 2018
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 28, 2018
@fabriziopandini fabriziopandini force-pushed the kubeadm-config-configMap branch 2 times, most recently from c0c0289 to 71bf4f7 Compare August 28, 2018 21:22
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2018
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only did a quick scan, let me know when you're ready and I'll go through this in detail.

func GetMachineID() (string, error) {
// the IP of the host default network interface as a machine unique id
ip, err := netutil.ChooseBindAddress(nil)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generating some UUID is pretty useful when folks are changing nodes in a virutual infra will be useful.


// HasKubeletConfigMap returns true if the Kubelet is expected to have a config map stored in the cluster
func HasKubeletConfigMap(cfg *kubeadmapi.ClusterConfiguration) bool {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget it, now removed

KubeProxyConfigMap = "kube-proxy"

// KubeProxyConfigMapKey specifies in what ConfigMap key the component config of kube-proxy should be stored
KubeProxyConfigMapKey = "config.conf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key name used in kube-proxy manifest.. just refactored in a constant because it is now used in different places.

@@ -79,7 +79,7 @@ spec:
imagePullPolicy: IfNotPresent
command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --config=/var/lib/kube-proxy/{{ .ProxyConfigMapKey }}s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the trailing s desired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, now fixed, thanks

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2018
@fabriziopandini
Copy link
Member Author

@timothysc

  1. Implemented the logic for getting the node name from the kubelet.conf. However it was necessary to get the user name from x509 certificate and manage two cases:
    • kubelet.conf generated by kubeadm
    • kubelet.conf generated by TLSBootstrap.
  2. Used the node name as a key for the ClusterConfig.APIEndpoints map (insterad of the netutil.ChooseBindAddress or hostname)
  3. Implemented a lot of unit tests for all the upload /fetch config jiggery
  4. Fixed kubeadm join --control-plane, kubeadm upgrade and kubeadm upgrade node --control-plane workflows with the new upload /fetch config jiggery removing some TODOs left in previous PRs
  5. completed first rounds of manual testing:
    • V1.11.2 --> v1.12.0
    • V1.12.0 with two masters --> v1.12.1

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass run through.


// Decodes the Config map into the internal component config
obj := &kubeletconfig.KubeletConfiguration{}
err = unmarshalObject(obj, []byte(kubeletCfg.Data[kubeadmconstants.KubeletBaseConfigurationConfigMapKey]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you index check 1st otherwise you could SEGV here. .Data[kubeadmconstants.KubeletBaseConfigurationConfigMapKey]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed + added UT


// Decodes the Config map into the internal component config
obj := &kubeproxyconfig.KubeProxyConfiguration{}
err = unmarshalObject(obj, []byte(kubeletCfg.Data[kubeadmconstants.KubeProxyConfigMapKey]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

return nil, err
}
return obj, nil
}

func unmarshalObject(obj runtime.Object, fileContent []byte) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear these functions exist elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function uses a scheme with component configs api registered.
The other unmarshal func uses a scheme with only kubeadm types registered

}

// Create a role for granting read only access to the kube-proxy component config ConfigMap
if err := apiclient.CreateOrUpdateRole(client, &rbac.Role{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests to validate the updated RBAC rules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only test on RBAC I'm aware of are in test\e2e_kubeadm but I'm not sure if those test are used in test grid batches...


// Prepare the ClusterStatus for upload
// Gets the current cluster status
clusterStatus, err := getClusterStatus(client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are ok for now, but we should add a //TODO use configmap locks on this object on the get before the update. We may even bug fix that after the freeze

/cc @liztio

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return "", err
}
} else {
return "", fmt.Errorf("Invalid kubelet.conf. X509 certificate expected")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit / optional: errors.New() instead of fmt.Errorf()


// We are only putting one certificate in the certificate pem file, so it's safe to just pick the first one
// TODO: Support multiple certs here in order to be able to rotate certs
cert := certs[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a tracking issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. I will send tracking issues for TODOs soonish (after lgtm)

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini at first pass this LGTM, has some tiny nits though

kubeConfigFile := filepath.Join(kubeadmconstants.KubernetesDir, kubeadmconstants.AdminKubeConfigFileName)

client, err := kubeconfigutil.ClientSetFromFile(kubeConfigFile)
if err != nil {
return errors.Wrap(err, "couldn't create kubernetes client")
}

err = markmasterphase.MarkMaster(client, j.cfg.NodeRegistration.Name, j.cfg.NodeRegistration.Taints)
if err != nil {
glog.V(1).Infof("[join] uploading currently used configuration to the cluster")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infof -> Info

return fmt.Errorf("error uploading configuration: %v", err)
}

glog.V(1).Infof("[join] marking the master with right label")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infof -> Info

}

if obj == nil {
t.Errorf("unexpected nil return value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errorf -> Error

if configData == "" {
t2.Fatalf("Fail to find ConfigMap key")
t2.Fatalf("Fail to find ClusterConfigurationConfigMapKey key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fatalf -> Fatal


statusData := masterCfg.Data[kubeadmconstants.ClusterStatusConfigMapKey]
if statusData == "" {
t2.Fatalf("failed to find ClusterStatusConfigMapKey key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fatalf -> Fatal

}

if !reflect.DeepEqual(decodedStatus, status) {
t2.Errorf("the initial and decoded ClusterStatus didn't match")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errorf -> Error


actual, err := getClusterStatus(client)
if err != nil {
t.Errorf("GetClusterStatus returned unexpected error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errorf -> Error

return
}
if tt.expectedClusterEndpoints != len(actual.APIEndpoints) {
t.Errorf("actual ClusterStatus doesn't return expected endpoints")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errorf -> Error

// gets the APIEndpoint for the current machine from the ClusterStatus
e, ok := clusterStatus.APIEndpoints[nodeName]
if !ok {
return fmt.Errorf("failed to get APIEndpoint information for this node")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Errorf -> errors.New

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

We're too late for nits, we'll have to beat on this during freeze.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 4, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@timothysc timothysc added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 4, 2018
@timothysc
Copy link
Member

/test pull-kubernetes-integration

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63011, 68089, 67944, 68132). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants