Skip to content
This repository has been archived by the owner on Apr 7, 2020. It is now read-only.

Azure: Move to standard loadbalancer #246

Merged
merged 1 commit into from
Sep 10, 2019

Conversation

dkistner
Copy link
Contributor

@dkistner dkistner commented Aug 6, 2019

What this PR does / why we need it:
As prerequisite to support Availability Zones on Azure (#135) we need to move to LoadBalancers with the standard SKU instead of the basic SKU, because only this type of LoadBalancer supports load balancing across zones.

Which issue(s) this PR fixes:
Fixes #207

Special notes for your reviewer:
Update the ControllerRegistration for the Azure Provider extension to use a custom build image which contains this PR (dominickistner/gardener-extension-hyper:0.11.0-az-standard-lb).
After the Azure extensions on the Seed cluster are updated, create an Azure Shoot and check after creation has been completed, if the LoadBalancer is of sku standard.

To test the upgrade procedure. Create upfront an Azure Shoot with basic LoadBalancer SKU.
Then upgrade the ControllerRegistration as above described and follow the instruction in the guide controllers/provider-azure/docs/migrate-loadbalancer.md

Release note:

Gardener supports now load balancers with standard sku. New cluster use automatically standard sku load balancers. Existing clusters remain with basic sku load balancers until they are manually migrated.
Gardener supports now load balancers with standard sku. To migrate the LoadBalancer from the basic sku to standard sku follow the instructions in https://github.com/gardener/gardener-extensions/tree/master/controllers/provider-azure/docs/migrate-loadbalancer.md

@dkistner dkistner added platform/az reviewed/do-not-merge Has no approval for merging, may not be merged as it may break things or be of poor quality labels Aug 6, 2019
@dkistner dkistner requested review from stoyanr and a team as code owners August 6, 2019 13:20
@dkistner dkistner self-assigned this Aug 6, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 6, 2019
@dkistner
Copy link
Contributor Author

dkistner commented Aug 6, 2019

Please do not merge yet, due to some ongoing test activities.
I raised the pr upfront to maintain the release notes :)

@dkistner dkistner requested a review from MSSedusch August 6, 2019 13:22
@rfranzke
Copy link
Contributor

rfranzke commented Aug 6, 2019

To migrate from the basic sku to standard it is required to remove all services with type LoadBalancer in the cluster. When all services are removed then Gardener needs to reconcile the cluster. After that the services can be recreated.

Can you explain again how this needs to be done? I as an end-user need to delete all my load balancer services, ok. What happens if I create new services of type load balancer (without deleting the old ones (because I don't want to?)).

When all services are removed then Gardener needs to reconcile the cluster.

What does this mean? When this change is rolled out then the cloud provider config is updated for all existing clusters automatically. I don't understand this part.

Does this configuration work for all Kubernetes versions (1.10.x - 1.15.x)?

@dkistner
Copy link
Contributor Author

dkistner commented Aug 6, 2019

Can you explain again how this needs to be done? I as an end-user need to delete all my load balancer services, ok. What happens if I create new services of type load balancer (without deleting the old ones (because I don't want to?)).

The k8s Azure cloud provider maintain only one loadbalancer on the infra for all lb services. Even if the cloud-provider-config has already the standard load balancer sku configured, it will not automatically migrate the existing basic load balancer to a new standard load balancer.
The existing load balancer will be automatically removed when all lb services are gone and recreated as a standard one when new lb services are deployed.

If you add new lb services without removing all lb services before then k8s will still maintain the existing basic load balancer.

What does this mean? When this change is rolled out then the cloud provider config is updated for all existing clusters automatically. I don't understand this part.

This is not really needed if the user recreates also the Gardener managed lb services e.g. vpn-shoot.

Does this configuration work for all Kubernetes versions (1.10.x - 1.15.x)?

yes

Please consider this all as work in progress. Let's discuss if we need to implement migration paths within the azure extension to move from basic to standard load balancers.

@rfranzke
Copy link
Contributor

rfranzke commented Aug 6, 2019

OK, great, thanks for the explanation!

This is not really needed if the user recreates also the Gardener managed lb services e.g. vpn-shoot.

I don't think we can implement anything because we don't know when the user wants to migrate (if he/she wants to at all). I think the solution you described is good: If the user wants to migrate, simply delete all load balancer services (also ours in the kube-system namespace).

Let's discuss if we need to implement migration paths within the azure extension to move from basic to standard load balancers.

Same, I think your solution is good if this really works for all Kubernetes versions. Let's keep it like this.

@dkistner
Copy link
Contributor Author

dkistner commented Aug 6, 2019

Ok Update:
When the cloud-provider-config already contains the standard lb sku config, but the load balancers is not migrated (all services deleted and lb recreated), then it is not possible to add new lb services, because k8s is creating new pubic ips of type standard, but the existing basic load balancer is not supporting this type (only public ips of type basic).

Therefore I see possible options:

  1. All new Shoots using automatically standard lbs. Existing Shoots remain with basic lbs. But I think we want a harmonised solution and we would later need to move to standard -> technical depts.
  2. Implement migration paths to move from basic to standard lbs (means we need to delete all lb service and recreate them again).
  3. Make it configurable in the Shoot config, which type of load balancer should be used. Default would be standard. This field needs to be immutable (as long as no migration logic is implemented). We would support both types of lbs.

@rfranzke
Copy link
Contributor

Any plan going forward? If not yet and further design discussions are needed, can we close the PR and continue in an issue?

@dkistner
Copy link
Contributor Author

Yes, let's close it for the moment. Will reopen it when we have a decission and continue to work on it.

@dkistner dkistner closed this Aug 23, 2019
@rfranzke
Copy link
Contributor

OK, thanks 👍

@dkistner
Copy link
Contributor Author

dkistner commented Sep 9, 2019

Reopened this PR.
We decided to move forward and create new Azure Shoots with standard LoadBalancers instead of basic ones.
We do not implement an automatic migration and old cluster need to migrated manually if they need to use standard LoadBalancers. We added a guide which describes how to migrate an Azure Shoot from using basic LoadBalancers to standard LoadBalancers.

@AndreasBurger Could you please have a look and test also the upgrade procedure?

@dkistner dkistner reopened this Sep 9, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2019
MSSedusch
MSSedusch previously approved these changes Sep 9, 2019
@dkistner dkistner changed the title [WIP] Azure: Move to standard loadbalancer Azure: Move to standard loadbalancer Sep 9, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 10, 2019
@dkistner
Copy link
Contributor Author

Seems that some tests for the Azure Provider extension does not work. I will fix this.

@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 10, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 10, 2019
@dkistner dkistner removed the reviewed/do-not-merge Has no approval for merging, may not be merged as it may break things or be of poor quality label Sep 10, 2019
Copy link
Contributor

@AndreasBurger AndreasBurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested; lgtm

@rfranzke rfranzke merged commit a18d4e6 into gardener-attic:master Sep 10, 2019
@dkistner dkistner deleted the azure-lb branch October 23, 2019 12:58
@ghost ghost added the platform/azure Microsoft Azure platform/infrastructure label Mar 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) platform/azure Microsoft Azure platform/infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants