Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support creation of GCE_VM_PRIMARY_IP NEGs for L4 ILB Services. #959

Merged
merged 3 commits into from
Jan 16, 2020

Conversation

prameshj
Copy link
Contributor

@prameshj prameshj commented Dec 2, 2019

This change adds support in transaction syncer to create GCE_VM_PRIMARY_IP NEGs.

The neg controller has 2 new flags - one for watching ingress and one for services.
LoadBalancer services will be processed only if the second flag is enabled. This is disabled by default and can be enabled by adding "ILBSubsets" Alpha feature gate.

/assign @freehan @bowei

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 2, 2019
@@ -316,6 +338,9 @@ func getSyncerKey(namespace, name string, servicePortKey negtypes.PortInfoMapKey
if flags.F.EnableNonGCPMode {
networkEndpointType = negtypes.NonGCPPrivateEndpointType
}
if portInfo.PortTuple.Port == 0 && portInfo.PortTuple.Name == string(negtypes.VmPrimaryIpEndpointType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if the PortTuple is empty to determine if it is VM_PRIMARY_IP instead of using the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

NegName: namer.NEG(namespace, name, svcPortTuple.Port),
ReadinessGate: readinessGate,
PortTuple: svcPortTuple,
NegName: namer.NEG(namespace, name, svcPortTuple.Port),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a new namer func for VM_PRIMARY_IP NEG type with format k8s1-clusterhash-namespace-name-hash

Create a new func similar to NewPortInfoMap to create PortInfoMap for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

recorder record.EventRecorder
cloud negtypes.NetworkEndpointGroupCloud
zoneGetter negtypes.ZoneGetter

// randomize indicates that the endpoints of the NEG can be picked at random, rather
// than following the endpoints of the service. This only applies in the GCE_VM_PRIMARY_IP NEG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put
"This only applies in the GCE_VM_PRIMARY_IP NEG" in front

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Merge the current state from cloud with the transaction table together
// The combined state represents the eventual result when all transactions completed
mergeTransactionIntoZoneEndpointMap(currentMap, s.transactions)
// Calculate the endpoints to add and delete to transform the current state to desire state
addEndpoints, removeEndpoints := calculateNetworkEndpointDifference(targetMap, currentMap)
if s.randomize && len(removeEndpoints) > 0 {
// Make removals minimum since the traffic will be abruptly stopped. Log removals
klog.Infof("Removing endpoints %+v from GCE_VM_PRIMARY_IP NEG %s", removeEndpoints, s.negName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a log level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -156,7 +177,9 @@ func (s *transactionSyncer) syncInternal() error {
// filter out the endpoints that are in transaction
filterEndpointByTransaction(committedEndpoints, s.transactions)

s.commitPods(committedEndpoints, endpointPodMap)
if s.NegSyncerKey.NegType != negtypes.VmPrimaryIpEndpointType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try not to skip it.
Just explore if it is okay to leave it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is ok to leave as is without the if check. Will make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


switch {
case s.NegSyncerKey.NegType == negtypes.VmPrimaryIpEndpointType:
nodeLister := listers.NewNodeLister(s.nodeLister)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a NewNodeLister here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to select nodes at random for the NEG backing an "ExternalTrafficPolicy: Cluster" ilb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store NodeLister instead of the Cache.Indexer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utils_test.go uses the Cache.Indexer in order to add nodes and test the transaction functions. Leaving it as is for now. Please let me know if you have ideas on how to continue testing this.

@@ -169,6 +170,42 @@ func ensureNetworkEndpointGroup(svcNamespace, svcName, negName, zone, negService
return nil
}

func toZonePrimaryIPEndpointMap(endpoints *apiv1.Endpoints, nodeLister listers.NodeLister, zoneGetter negtypes.ZoneGetter, randomize bool, currentMap map[string]negtypes.NetworkEndpointSet, serviceKey string) (map[string]negtypes.NetworkEndpointSet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return subset
}

func getSubsetPerZone(nodes []*v1.Node, zoneGetter negtypes.ZoneGetter, svcID string, currentMap map[string]negtypes.NetworkEndpointSet, newEpCount int, randomize bool) (map[string]negtypes.NetworkEndpointSet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return result, nil
}

func getSubsetCount(currentCount, newCount, numZones int, randomize bool) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nodes, _ := nodeLister.ListWithPredicate(utils.GetNodeConditionPredicate())
return getSubsetPerZone(nodes, zoneGetter, serviceKey, currentMap, utils.NumEndpoints(endpoints), true)
}
nodes := []*v1.Node{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment here to describe the outcome:

basically pick the nodes where the endpionts are located.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// If the input list is smaller than the desired subset count, the entire list is returned. The hash salt
// is used so that a different subset is returned even when the same node list is passed in, for a different salt value.
// It also keeps the subset relatively stable for the same service.
func PickSubsetsNoRemovals(nodes []*v1.Node, salt string, count int, current []negtypes.NetworkEndpoint) []*v1.Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to export this function, right?

Only used in this package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the function name is a bit confusing. PickSubsetsNoRemovals seems to indicate No removal. But it actually is minimal removal? Or minimal change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, changed the name and not exported.

return hex.EncodeToString(hashSum[:])
}

// PickSubsetsNoRemovals ensures that there are no node removals from current subset unless the node no longer exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some examples in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
}
if len(subset) >= count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment remove excessive nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)

// NodeInfo stores node metadata used to sort nodes and pick a subset.
type NodeInfo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment each field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Pick all nodes from existing subset if still available.
for _, ep := range current {
for _, nodeInfo := range info {
curHashName := getHashedName(ep.Node, salt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate this in the outter for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, done.

return info[i].hashedName < info[j].hashedName
})
// Pick all nodes from existing subset if still available.
for _, ep := range current {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if you can make this easier if you use maps. (1map[hashedName]NetworkEndpoint, another map[hashedName]NodeInfo

After you picked the existing ones, the left over ones can be sorted and picked. Not sure if this is good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that can be done.. but I am not sure if it simplifies it that much.
I can use a map for the current subset, another map for the current nodes - map[nodeName]NodeInfo.
Picking existing nodes becomes easier. It does get rid of the "skip" boolean as nodes can be deleted from the NodeInfo map as they are picked.
But we need translation from map to slice before sorting as well as translating current subset slice to map as well.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 31, 2019
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 31, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2020
@prameshj prameshj force-pushed the negl4ilb branch 6 times, most recently from dee780b to 202eba3 Compare January 9, 2020 21:16
@@ -130,21 +140,32 @@ func (s *transactionSyncer) syncInternal() error {
return nil
}

targetMap, endpointPodMap, err := toZoneNetworkEndpointMap(ep.(*apiv1.Endpoints), s.zoneGetter, s.PortTuple.Name, s.podLister, s.NegSyncerKey.SubsetLabels, s.NegSyncerKey.NegType)
currentMap, err := retrieveExistingZoneNetworkEndpointMap(s.negName, s.zoneGetter, s.cloud, s.NegSyncerKey.GetAPIVersion())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move mergeTransactionIntoZoneEndpointMap after this. That merges the currentMap with the ongoing transactions (NEG API calls in flight). Then the result is the end state after all transactions are done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the end result of mergeTransactionIntoZoneEndpointMap to calculate the subset, so we use the result of all pending transactions being applied. Do we want to do something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved offline, this is the correct behavior.

}

// IsLegacyL4ILBService returns true if the given LoadBalancer service is managed by service controller.
func IsLegacyL4ILBService(g *gce.Cloud, svc *api_v1.Service) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gce.Cloud is not used, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, fixed.

@@ -67,6 +74,7 @@ type Controller struct {
ingressLister cache.Indexer
serviceLister cache.Indexer
client kubernetes.Interface
gceCloud *gce.Cloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, fixed.

@@ -177,6 +206,7 @@ func (p1 PortInfoMap) Merge(p2 PortInfoMap) error {
mergedInfo.NegName = portInfo.NegName
// Turn on the readiness gate if one of them is on
mergedInfo.ReadinessGate = mergedInfo.ReadinessGate || portInfo.ReadinessGate
mergedInfo.RandomizeEndpoints = mergedInfo.RandomizeEndpoints || portInfo.RandomizeEndpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an error instead of a merge.

I do not think a service should ever has this 2 states .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Max number of subsets in ExternalTrafficPolicy:Local
maxSubsetSizeLocal = 250
// Max number of subsets in ExternalTrafficPolicy:Cluster, which is the default mode.
maxSubsetSizeDefault = 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a recommendation. consider 24

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, 24 is a better number for equally splitting between 2, 3, 4 zones. We will have the same split for 25 too, since we do integer division.


// LocalL4ILBEndpointGetter implements methods to calculate Network endpoints for VM_PRIMARY_IP NEGs when the service
// uses "ExternalTrafficPolicy: Local" mode.
type LocalL4ILBEndpointsCalculator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to say which interface it implementations.
and the corresponding algorithm and example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// CalculateEndpoints determines the endpoints in the NEGs based on the current service endpoints and the current NEGs.
func (l *LocalL4ILBEndpointsCalculator) CalculateEndpoints(ep *v1.Endpoints, currentMap map[string]types.NetworkEndpointSet) (map[string]types.NetworkEndpointSet, types.EndpointPodMap, error) {
// List all nodes where the service endpoints are running. Get a subset of the desired count.
nodeZoneMap := make(map[string][]*v1.Node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: zoneNodeMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

numZones := len(nodeZoneMap)
perZoneCount := l.getPerZoneSubsetCount(numZones, numEndpoints)
// Compute the networkEndpoints, with endpointSet size in each zone being atmost `perZoneCount` in size
subsetMap, err := getSubsetPerZone(nodeZoneMap, perZoneCount, l.svcId, currentMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the following case would happen?

OnlyLocal service with endpoints in 3 zone. Zone a has 240 endpoints, zone b/c each has 1 endpoint.
Will it put 80 endpoints in NEG in zone a and 1 endpoint in zone b and c?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getSubsetPerZone and pickSubsetsMinRemovals need some reorganizing to solve this problem.

Just a thought:
given inputs:

  • MAX_ENDPOINT_IN_SUBSET either 25 or 250.
  • candidates nodes
  • current map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, this can happen. I had added a TODO to fix this in a follow up, I will added it back. It can be fixed by picking upto limit/n nodes, but selecting more than that if other zones had fewer nodes. I can try to include that change in this same PR or in a follow up.

@@ -88,6 +100,19 @@ func NewTransactionSyncer(negSyncerKey negtypes.NegSyncerKey, networkEndpointGro
return syncer
}

func getEndpointsCalculator(syncer *transactionSyncer) negtypes.NetworkEndpointsCalculator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be consider initialize EndpointsCalculator in the SyncerManager?
so that randomize is no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -463,6 +463,14 @@ func (n *Namer) NEGWithSubset(namespace, name, subset string, port int32) string
return fmt.Sprintf("%s-%s-%s-%s-%s-%s", n.negPrefix(), truncNamespace, truncName, truncPort, truncSubset, negSuffix(n.shortUID(), namespace, name, portStr, subset))
}

func (n *Namer) PrimaryIPNEG(namespace, name string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@prameshj prameshj force-pushed the negl4ilb branch 2 times, most recently from 40eec7f to d0a8d85 Compare January 16, 2020 01:11
LoadBalancer services will be processed only if the runL4
flag is true. This is false by default.

The neg controller also watches Nodes if runL4 is enabled.

The subsetting logic minimizes node removals from Endpoint Group
to avoid disruption to existing LB sessions.
Generated using "go mod vendor"
Added a controller test to ensure syncers created.
Also added unit tests for newly added utils.
Copy link
Contributor

@freehan freehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: freehan, prameshj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2020
@k8s-ci-robot k8s-ci-robot merged commit c0fef63 into kubernetes:master Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants