Adding logic to GCE ingress controller to handle multi cluster ingresses #1033

nikhiljindal · 2017-07-27T19:58:16Z

Ingress controller should manage instance groups and ports for multi cluster ingresses.

Things to do:

Add tests
Verify manually that this works
Ensure that resources are GC'd as expected.

Sending it out now to get early feedback.

Update:
Verified that the code works as expected with the following scenarios:

Creating an ingress with gce-multi class, creates expected instance group and deleting the ingress, deletes the ig.
If there is an existing ingress, then creating gce-multi ingress, reuses the existing instance group rather than creating a new one.
There is an existing issue that deleting the ingress does not delete the corresponding named ports which is being tracked in #695.

Will add e2e tests in main repo to automate these tests.

cc @nicksardo

k8s-reviewable · 2017-07-27T19:58:22Z

This change is

coveralls · 2017-07-27T20:20:47Z

Coverage decreased (-0.03%) to 43.994% when pulling e1642581de5208fcd38b8abf4a369f36333a3066 on nikhiljindal:reportIGName into 3495bfb on kubernetes:master.

nikhiljindal · 2017-08-08T03:05:31Z

Updated the code to add default backend service node port to the instance group and also add an annotation with the instance group names.

Verified that the code works as expected with the following scenarios:

Creating an ingress with gce-multi class, creates expected instance group and deleting the ingress, deletes the ig.
If there is an existing ingress, then creating gce-multi ingress, reuses the existing instance group rather than creating a new one.

There is an existing issue that deleting the ingress does not delete the corresponding named ports which is being tracked in #695.

Will add e2e tests in main repo to automate these tests.

cc @nicksardo PTAL

coveralls · 2017-08-08T03:05:35Z

Coverage decreased (-0.3%) to 44.503% when pulling ab61605 on nikhiljindal:reportIGName into cf732e8 on kubernetes:master.

coveralls · 2017-08-08T03:27:26Z

Coverage decreased (-0.06%) to 44.712% when pulling ab61605 on nikhiljindal:reportIGName into cf732e8 on kubernetes:master.

nicksardo · 2017-08-09T21:33:31Z

controllers/gce/controller/utils.go

+func addInstanceGroupsAnnotation(existing map[string]string, igs []*compute.InstanceGroup) (map[string]string, error) {
+	if existing == nil {
+		existing = map[string]string{}
+	}


I prefer asserting that the map exists outside of this func. Changing this would mean we don't need the first return variable either.

On another note, a less overloaded term than add would be nice, ie set or apply

Done. Renamed to set

coveralls · 2017-08-11T01:01:48Z

Coverage decreased (-0.1%) to 44.45% when pulling 46efeb3 on nikhiljindal:reportIGName into 45e43f8 on kubernetes:master.

freehan · 2017-08-15T23:01:50Z

controllers/gce/controller/utils_test.go

@@ -263,3 +265,20 @@ func addNodes(lbc *LoadBalancerController, zoneToNode map[string][]string) {
 func getProbePath(p *api_v1.Probe) string {
 	return p.Handler.HTTPGet.Path
 }
+
+func TestAddInstanceGroupsAnnotation(t *testing.T) {


Add a test case for multiple zones?

freehan · 2017-08-15T23:10:20Z

controllers/gce/instances/utils.go

+
+// Helper method to create instance groups.
+// This method exists to ensure that we are using the same logic at all places.
+func CreateInstanceGroups(nodePool NodePool, namer *utils.Namer, port int64) ([]*compute.InstanceGroup, *compute.NamedPort, error) {


This just wraps around the namer, right?
IIUC, the namer is actually the same one.
Is it really worthy?

This is to ensure that the same logic is used at both places.
It is to prevent scenarios where one place is updated but not the other.

Do you still need this?

you only need to expose it at ClusterManager, right?

Yes still need it. Its also called from backends.go

Why not just remove this and do nodePool.AddInstanceGroup(namer.IGName(), port) in both places?

Doing that risks someone changing the logic at one place without changing the other.
This is a simple wrapper method to ensure that the same code is used at both places.

freehan · 2017-08-15T23:12:49Z

controllers/gce/controller/utils.go

+	// name and zone of instance groups created for the ingress.
+	// This is read only for users. Controller will overrite any user updates.
+	// This is only set for ingresses with ingressClass = "gce-multi"
+	instanceGroupsKey = "ingress.gcp.kubernetes.io/instance-groups"


nit: instanceGroupsAnnotationKey

freehan · 2017-08-15T23:17:40Z

controllers/gce/controller/utils.go

-	gceIngressClass = "gce"
+	ingressClassKey      = "kubernetes.io/ingress.class"
+	gceIngressClass      = "gce"
+	gceMultiIngressClass = "gce-multi"


Not sure whether the class key is settled. Maybe gce-multi-cluster is better?

freehan · 2017-08-15T23:38:26Z

controllers/gce/controller/controller.go

@@ -342,6 +349,39 @@ func (lbc *LoadBalancerController) sync(key string) (err error) {
 	return syncError
 }

+func (lbc *LoadBalancerController) syncMultiClusterIngress(ing *extensions.Ingress, nodeNames []string) error {


There is duplicate logic here with Checkpoint. Can you refactor Checkpoint to avoid having dupplicate logic?

Currently, CreateInstanceGroup is done as part of syncing backend services. Extract create and sync instance group, and set the annotation, if gce-multi, then return early, otherwise, continue for the rest.

This requires more changes to existing logic though. I will let @nicksardo to have more input for this.

yeah I dont want to call the whole Checkpoint method just to sync instance groups. I think its better to call that specific method directly here. This also ensures that the existing ingress code is independent and unchanged.

freehan · 2017-08-16T00:01:43Z

controllers/gce/controller/controller.go

+	if ingExists {
+		ing := obj.(*extensions.Ingress)
+		if isGCEMultiClusterIngress(ing) {
+			return lbc.syncMultiClusterIngress(ing, nodeNames)


current me if I am wrong, if an ingress is updated from gce to gce-multi. The rest of GCP resources will stay around?

lbc.CloudClusterManager.GC above in line 303 will still be called which should delete those resources.

Still, that is an interesting case I hadnt thought of. What if we say that users can not "upgrade" an existing gce ingress to gce-multi. They will always have to delete the existing gce ingress and create a new gce-multi ingress?

nikhiljindal · 2017-08-28T22:43:42Z

Thanks for the review @freehan
Updated the code as per comments. PTAL

nikhiljindal · 2017-08-31T22:20:47Z

cc @csbell

nikhiljindal · 2017-09-07T01:19:01Z

@freehan Pushed a new commit to refactor the code to merge multi cluster sync with single cluster sync.
PTAL

csbell · 2017-09-07T04:08:51Z

controllers/gce/controller/cluster_manager.go

+	var igs []*compute.InstanceGroup
+	var err error
+	for _, p := range servicePorts {
+		igs, _, err = instances.CreateInstanceGroups(c.instancePool, c.ClusterNamer, p.Port)


Shouldn't we be appending to igs?

CreateInstanceGroups creates instance groups in all zones and then adds the given named port to it.
Each time CreateInstanceGroups is called for a port, it always returns the same set of all igs. So we dont need append, just return the output of any call.

Ideally, we should call CreateInstanceGroups only the first time and then call AddNamedPort subsequent times, but that interface is not exposed yet. Will add a TODO and a comment explaining this.

Added the comment and TODO

nikhiljindal · 2017-09-07T06:40:27Z

Rebased to resolve conflicts

coveralls · 2017-09-07T07:08:59Z

Coverage increased (+0.2%) to 43.684% when pulling 0f756ae on nikhiljindal:reportIGName into 7434c50 on kubernetes:master.

freehan

LGTM overall, some nits. @nicksardo for a final review.

freehan · 2017-09-08T22:29:20Z

controllers/gce/instances/utils.go

+
+// Helper method to create instance groups.
+// This method exists to ensure that we are using the same logic at all places.
+func CreateInstanceGroups(nodePool NodePool, namer *utils.Namer, port int64) ([]*compute.InstanceGroup, *compute.NamedPort, error) {


Do you still need this?

you only need to expose it at ClusterManager, right?

freehan · 2017-09-08T22:47:26Z

controllers/gce/controller/utils.go

@@ -289,6 +304,17 @@ func ListAll(store cache.Store, selector labels.Selector, appendFn cache.AppendF
 func (s *StoreToIngressLister) List() (ing extensions.IngressList, err error) {


let us refactor here a little.

Refactor the current StoreToIngressLister.List() function into ListGCEIngress() and ListGCEMultiClusterIngress.

Change the return value from extensions.IngressList to []*extensions.Ingress.

Change the input value for GCETranslator.toNodePorts() to use []*extensions.Ingress

List is also used at other places. I dont want all callers to get gce and gceMulti ingresses first and then append, so retaining the List method.

ListGCEMultiIngresses will not be used anywhere so not required atm.

List is also used in ListRuntimeInfo. Then lbs is passed into Checkpoint. Then you filter lbs to singleClusterLbs in Checkpoint.

Please just use ListGCEIngresses in ListRuntimeInfo and remove the filtering.

I would prefer changing List to something else. Like ListAll. Next time, when someone tries to use it, it would not get confused and leave a bug that only triggers when there are a combination of gce and multi-cluster ingresses.

Also, change the comment to explicitly say GCE ingress and multi-cluster ingress

Good catch.
Renamed List to ListAll and updated the comment to be explicit.
Also renamed ListRuntimeInfo to ListGCERuntimeInfo and added a comment stating that it returns runtimeinfo only for gce ingresses and not for multi cluster ingresses.
Also removed the IsMultiCluster field from RuntimeInfo and updated Checkpoint accordingly.

freehan · 2017-09-08T22:49:45Z

controllers/gce/controller/controller.go

@@ -279,7 +280,13 @@ func (lbc *LoadBalancerController) sync(key string) (err error) {
 	if err != nil {
 		return err
 	}
+	singleClusterIngresses, err := lbc.ingLister.ListGCEIngresses()


gceIngresses, err := lbc.ingLister.LIstGCEIngress() gceMutliIngresses, err := lbc.ingLister.ListGCEMultiClusterIngress() combinedIngresses := append(gceIngresses, gceMutliIngresses) gceIngressNodeports := lbc.tr.toNodePorts(gceIngresses) combinedNodeports := lbc.tr.toNodePorts(combinedIngresses)

renamed to gceIngresses, gceNodePorts and allIngresses, allNodePorts.

nikhiljindal · 2017-09-11T19:16:52Z

Thanks @freehan Updated code as per comments.
PTAL.

coveralls · 2017-09-11T19:37:20Z

Coverage increased (+0.2%) to 44.231% when pulling 8cd2902924e997c112fd18a35ee67174d5b9ab41 on nikhiljindal:reportIGName into 18ea2f7 on kubernetes:master.

… ingress sync

nikhiljindal · 2017-09-11T21:46:01Z

Updated code as per comments.
PTAL.

coveralls · 2017-09-11T22:09:26Z

Coverage increased (+0.2%) to 44.19% when pulling 0f4f5c9 on nikhiljindal:reportIGName into 18ea2f7 on kubernetes:master.

freehan · 2017-09-11T22:35:41Z

LGTM

nikhiljindal · 2017-09-12T03:32:29Z

Pushed a new commit to call CreateInstanceGroups only once instead of thrice before. Its an extra optimization, should not have any user visible impact (except maybe get a bit faster).

coveralls · 2017-09-12T03:52:44Z

Coverage increased (+0.2%) to 44.234% when pulling 32f311d on nikhiljindal:reportIGName into 18ea2f7 on kubernetes:master.

nicksardo · 2017-09-12T20:26:30Z

controllers/gce/backends/backends.go

-	if err != nil {
-		return err
+	var err error
+	if igs == nil {


Suggest adding comment saying when igs is nil.

nicksardo · 2017-09-12T21:13:36Z

controllers/gce/controller/cluster_manager.go

+func (c *ClusterManager) SyncNodesInInstanceGroups(nodeNames []string) error {
+	if err := c.instancePool.Sync(nodeNames); err != nil {
+		return err
+	}


What's the point of this wrapper?

Yes had added it when the same code was being called from 2 places. Not required now after the latest refactor. Removed.

nicksardo · 2017-09-12T21:21:14Z

controllers/gce/controller/controller.go

 	lbNames := lbc.ingLister.Store.ListKeys()
-	lbs, err := lbc.ListRuntimeInfo()
+	lbs, err := lbc.ListGCERuntimeInfo()


IMO, another nice change would be to plumb gceIngresses to lbc.ListGCERuntimeInfo() and maybe rename it to lbc.ToGCERuntimeInfo(gceIngresses). There would be no performance increase from this change, but the code would be clearer to follow and the function would appear less magical. Up to you.

Yes its definitely cleaner!
toRuntimeInfo does not need any special comments or logic about getting gce ingresses only or multi cluster ingresses as well. It just converts the ingresses that it receives. So thats nice!

Also named it toRuntimeInfo instead of toGCERuntimeInfo since it now has no single or multi cluster specific logic. Its a generic method that returns RuntimeInfo structs for the given ingress objests

nicksardo · 2017-09-12T21:25:18Z

controllers/gce/controller/controller.go

 	// Record any errors during sync and throw a single error at the end. This
 	// allows us to free up associated cloud resources ASAP.
-	if err := lbc.CloudClusterManager.Checkpoint(lbs, nodeNames, nodePorts); err != nil {
+	if igs, err = lbc.CloudClusterManager.Checkpoint(lbs, nodeNames, gceNodePorts, allNodePorts); err != nil {


Since igs isn't being set anywhere else. I'd recommend killing the instantiation on ln 323 and breaking apart this line into two.

nicksardo · 2017-09-12T21:30:28Z

controllers/gce/controller/controller.go

@@ -329,21 +337,36 @@ func (lbc *LoadBalancerController) sync(key string) (err error) {
 	if !ingExists {
 		return syncError
 	}
+	ing := obj.(*extensions.Ingress)


Hmm, the dereference copy may have been on purpose from the original author. Thoughts?

hmm I dont see a reason why but am not confident enough yet. Reverted for now. Will look closer and probably change it in a different PR

nicksardo · 2017-09-12T21:35:31Z

controllers/gce/controller/utils.go

-					continue
-				}
-				knownPorts = append(knownPorts, port)
+				return knownPorts


Here and line 524: why early return? Original code had continues.

I have extracted an ingressToNodePorts method out that returns node ports for a single ingress.
toNodePorts calls ingressToNodePorts multiple times for each ingress.

Before this change, it was all a single method and hence we were using continue to move on to next ingress if there was an error with an ingress.
With the new method, ingressToNodePorts returns early in case of error and toNodePorts then calls it again with the next ingress.
So the behavior should continue to be same.

The original behavior was a continue of the local loop, not the outer loop, which means it would continue to the next path or rule.

of course, I feel silly!
Fixed it now.

nicksardo · 2017-09-12T21:39:00Z

controllers/gce/controller/utils.go

@@ -640,3 +673,34 @@ func (o PodsByCreationTimestamp) Less(i, j int) bool {
 	}
 	return o[i].CreationTimestamp.Before(o[j].CreationTimestamp)
 }
+
+// setInstanceGroupsAnnotation sets the instance-groups annotation with names of the given instance groups.
+func setInstanceGroupsAnnotation(existing map[string]string, igs []*compute.InstanceGroup) error {


Recommend naming something multi-cluster specific

The method does not have any multi cluster specific logic. Its a generic method that sets the instance groups annotation. Hence I am inclined to keep the name generic.

nicksardo · 2017-09-12T21:40:45Z

controllers/gce/controller/utils.go

+		Name string
+		Zone string
+	}
+	instanceGroups := []Value{}


nit: var instanceGroups []Value

nicksardo · 2017-09-12T21:46:57Z

controllers/gce/controller/utils.go

+	for _, p := range nodePorts {
+		portMap[p.Port] = p
+	}
+	nodePorts = []backends.ServicePort{}


nit: either var nodePorts []backends.ServicePort or nodePorts := make([]backends.ServicePort, 0 len(portMap))

Changed to nodePorts = make([]backends.ServicePort, 0 len(portMap))

nicksardo · 2017-09-12T21:50:05Z

controllers/gce/controller/cluster_manager.go

+	return igs, nil
+}
+
+func (c *ClusterManager) CreateInstanceGroups(servicePorts []backends.ServicePort) ([]*compute.InstanceGroup, error) {


Can we rename this and the other CreateInstanceGroups.
EnsureInstanceGroupsAndPorts?

Sure, done.

nicksardo · 2017-09-12T21:53:39Z

controllers/gce/instances/instances.go

@@ -63,8 +63,7 @@ func (i *Instances) Init(zl zoneLister) {
 // all of which have the exact same named port.
 func (i *Instances) AddInstanceGroup(name string, port int64) ([]*compute.InstanceGroup, *compute.NamedPort, error) {
 	igs := []*compute.InstanceGroup{}
-	// TODO: move port naming to namer
-	namedPort := &compute.NamedPort{Name: fmt.Sprintf("port%v", port), Port: port}
+	namedPort := utils.GetNamedPort(port)


In the next PR when passing all ports, can you please rename AddInstanceGroup to something like SyncInstanceGroups, please?

Sure. How about EnsureInstanceGroupsAndPorts? Same as the corresponding cluster_manager method name.

Sounds good!

nikhiljindal · 2017-09-13T01:38:09Z

Thanks for the detailed review @nicksardo!
Have pushed updated code as per comments. PTAL

coveralls · 2017-09-13T02:14:47Z

Coverage increased (+0.2%) to 44.21% when pulling e91ce1511d61cf9d381701220ba734a60c22d30e on nikhiljindal:reportIGName into 587a344 on kubernetes:master.

… a single call

coveralls · 2017-09-13T18:55:34Z

Coverage increased (+0.2%) to 44.234% when pulling 937cde6 on nikhiljindal:reportIGName into 981967b on kubernetes:master.

nicksardo · 2017-09-13T19:05:43Z

Awesome, everything looks good. Would you mind doing a round of testing for MC but also basic ingress in different states (multiple nodes, multiple ingresses, with default backend, no default backend, etc...)?

nikhiljindal · 2017-09-14T00:20:59Z

Brought up glbc with this code and verified that a single cluster ingress continues to work as expected.
Also verified that it works well with a multi cluster ingress (they reuse instance group).

nicksardo · 2017-09-14T00:34:12Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 27, 2017

nicksardo added the backend/gce label Jul 28, 2017

nikhiljindal force-pushed the reportIGName branch 2 times, most recently from dd518d2 to ab61605 Compare August 8, 2017 03:05

nikhiljindal changed the title ~~WIP: Adding logic to GCE ingress controller to handle multi cluster ingresses~~ Adding logic to GCE ingress controller to handle multi cluster ingresses Aug 8, 2017

nicksardo reviewed Aug 9, 2017

View reviewed changes

nikhiljindal force-pushed the reportIGName branch from ab61605 to 46efeb3 Compare August 11, 2017 00:38

freehan suggested changes Aug 16, 2017

View reviewed changes

nikhiljindal force-pushed the reportIGName branch from 46efeb3 to c9fbf24 Compare August 28, 2017 22:42

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 28, 2017

csbell reviewed Sep 7, 2017

View reviewed changes

Adding logic to GCE ingress controller to handle multi cluster ingresses

7d87f02

nikhiljindal force-pushed the reportIGName branch from 2e74c81 to 0f756ae Compare September 7, 2017 06:40

freehan reviewed Sep 8, 2017

View reviewed changes

nikhiljindal force-pushed the reportIGName branch from 0f756ae to 8cd2902 Compare September 11, 2017 19:14

Refactor code to merge multi cluster ingress sync with single cluster…

0f4f5c9

… ingress sync

nikhiljindal force-pushed the reportIGName branch from 8cd2902 to 0f4f5c9 Compare September 11, 2017 21:45

nicksardo reviewed Sep 12, 2017

View reviewed changes

nikhiljindal force-pushed the reportIGName branch from 32f311d to e91ce15 Compare September 13, 2017 01:35

Remove multiple calls to CreateInstanceGroups by reusing results from…

937cde6

… a single call

nikhiljindal force-pushed the reportIGName branch from e91ce15 to 937cde6 Compare September 13, 2017 18:30

nicksardo self-assigned this Sep 13, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2017

nicksardo merged commit b2ad9e7 into kubernetes:master Sep 14, 2017

nikhiljindal mentioned this pull request Oct 3, 2017

GCE: Updating instance groups interface to accept all named ports at once #1463

Merged

		@@ -289,6 +304,17 @@ func ListAll(store cache.Store, selector labels.Selector, appendFn cache.AppendF
		func (s *StoreToIngressLister) List() (ing extensions.IngressList, err error) {

Adding logic to GCE ingress controller to handle multi cluster ingresses #1033

Adding logic to GCE ingress controller to handle multi cluster ingresses #1033

Conversation

nikhiljindal commented Jul 27, 2017 • edited Loading

k8s-reviewable commented Jul 27, 2017

coveralls commented Jul 27, 2017

nikhiljindal commented Aug 8, 2017

coveralls commented Aug 8, 2017

coveralls commented Aug 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Aug 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhiljindal commented Aug 28, 2017

nikhiljindal commented Aug 31, 2017

nikhiljindal commented Sep 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhiljindal commented Sep 7, 2017

coveralls commented Sep 7, 2017

freehan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freehan Sep 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhiljindal commented Sep 11, 2017

coveralls commented Sep 11, 2017

nikhiljindal commented Sep 11, 2017

coveralls commented Sep 11, 2017

freehan commented Sep 11, 2017

nikhiljindal commented Sep 12, 2017

coveralls commented Sep 12, 2017

nicksardo Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicksardo Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

nikhiljindal Sep 13, 2017 • edited Loading

Choose a reason for hiding this comment

nikhiljindal Sep 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhiljindal commented Jul 27, 2017 •

edited

Loading

freehan Sep 11, 2017 •

edited

Loading

nicksardo Sep 12, 2017 •

edited

Loading

nicksardo Sep 12, 2017 •

edited

Loading

nikhiljindal Sep 13, 2017 •

edited

Loading

nikhiljindal Sep 13, 2017 •

edited

Loading