Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop cluster-internal endpoint filtering / pod monitoring #9982

Merged
merged 1 commit into from
Aug 10, 2016

Conversation

danwinship
Copy link
Contributor

With #9383 merged, we no longer need to sanity-check cluster-internal service endpoints, so we can drop the pod monitor.

Unfortunately, for the moment at least, we still need to filter out cluster-external endpoints that violate EgressFirewall rules (#9227). Eventually we will hopefully be able to get rid of that. For now, this branch is based on top of the EgressFirewall branch (since otherwise we'd be completely dropping the endpoint filter in this branch, and then bringing it back in the EgressFirewall branch). It can land after that does.

Closes #9255.

@openshift/networking PTAL. Only the last commit is new; the rest is from 9227.

@@ -52,12 +47,6 @@ func (proxy *ovsProxyPlugin) Start(baseHandler pconfig.EndpointsConfigHandler) e

proxy.baseEndpointsHandler = baseHandler

// Populate pod info map synchronously so that kube proxy can filter endpoints to support isolation
pods, err := proxy.registry.GetAllPods()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need GetAllPods() anymore now either.

@dcbw
Copy link
Contributor

dcbw commented Jul 22, 2016

Last commit removing pod watch LGTM though we can remove some additional registry code too.

@liggitt
Copy link
Contributor

liggitt commented Jul 23, 2016

Old clusters can still have old data with manually created endpoints with now-disallowed addresses. Need to at least document the need to check/delete such endpoints on upgrade, even though the admission plugin will keep new ones from being created

@danwinship
Copy link
Contributor Author

Old clusters can still have old data with manually created endpoints with now-disallowed addresses. Need to at least document the need to check/delete such endpoints on upgrade, even though the admission plugin will keep new ones from being created

Added a coment to the release notes bug

"hostSubnets": hostSubnetStorage,
"netNamespaces": netNamespaceStorage,
"clusterNetworks": clusterNetworkStorage,
"egressNetworkPolicies": egressNetworkPolicyStorage,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable this resource only if openshift-sdn plugin is used?

We no longer need to do any filtering of cluster network / service
network IPs, because the endpoint admission controller does it for us.
@smarterclayton
Copy link
Contributor

Any reason this can't be merged? Changes look ok to me.

@liggitt
Copy link
Contributor

liggitt commented Aug 2, 2016

Can we add information to the release note that would help an admin who wanted to find/remove old illegal endpoints?

continue EndpointLoop
}
} else {
if !ni.ClusterNetwork.Contains(IP) && !ni.ServiceNetwork.Contains(IP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pre-existing, but if an endpoint contains any address outside the allowed range, we drop the entire endpoint object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... with the old (cluster-internal) filtering, you could really only hit this case if you were pretty explicitly trying to be evil, so there wasn't really any reason for us to be forgiving. I guess with the cluster-external filtering maybe you're more likely to do this accidentally so it might be better to delete just the bad addresses...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, not sure what the right thing to do is, can you spawn a follow-up issue to track?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #10212

@liggitt
Copy link
Contributor

liggitt commented Aug 2, 2016

lgtm as well, doc request and question notwithstanding

@danwinship danwinship changed the title Drop cluster-internal endpoint filtering / pod monitoring [DO NOT MERGE] Drop cluster-internal endpoint filtering / pod monitoring Aug 2, 2016
@danwinship
Copy link
Contributor Author

Any reason this can't be merged? Changes look ok to me.

It was blocking on the egress-firewall merge, but that has happened now

@danwinship
Copy link
Contributor Author

Can we add information to the release note that would help an admin who wanted to find/remove old illegal endpoints?

The suggestion that's already there ("look in your logs from before you upgraded") is really the only easy answer. There's no way to search for "all endpoints corresponding to services that don't have selectors and that have addresses that match a particular CIDR" so you'd basically have to just manually examine every endpoint. Or we could provide a script or something, but it would basically just be the code we're deleting here.

We could move the illegal-endpoint-checking code so that it only gets run once, at startup, on the master, and have it output suggested "oc delete" commands along with the warnings. (And then drop it in 3.4?)

@smarterclayton
Copy link
Contributor

I would recommend coming up with simple bash snippet that would find the
affected ones and document that:

oc get endpoints --all-namespaces --template '{{ range .items }}{{ .
metadata.name }} {{ range .subsets }}{{ range .addresses }}{{ .ip }} {{ end
}}{{ end }}{{ end }}' | grep -v ' 172.30.' | cut -f 1-1 -d ' '

or something

On Tue, Aug 2, 2016 at 10:41 AM, Dan Winship notifications@github.com
wrote:

Can we add information to the release note that would help an admin who
wanted to find/remove old illegal endpoints?

The suggestion that's already there ("look in your logs from before you
upgraded") is really the only easy answer. There's no way to search for
"all endpoints corresponding to services that don't have selectors and that
have addresses that match a particular CIDR" so you'd basically have to
just manually examine every endpoint. Or we could provide a script or
something, but it would basically just be the code we're deleting here.

We could move the illegal-endpoint-checking code so that it only gets run
once, at startup, on the master, and have it output suggested "oc delete"
commands along with the warnings. (And then drop it in 3.4?)


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#9982 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p-xqEC9cpgCvZjBHauALCWLkjH5Qks5qb1cwgaJpZM4JSM_g
.

@danwinship
Copy link
Contributor Author

That works for IPs in the service network, but for IPs in the cluster network, you only want to look at the endpoints that weren't generated by the endpoints controller, but you can only figure that out by comparing against the service table... so it would have to be something like

for endpoint in $(oc get services --template '[something that selects only selectorless services]'); do
    oc get endpoint $endpoint --template '[...]' | grep ' 10\.(12[8-9]|1[3-9][0-9]|2[0-5][0-9])\.' | ...

(assuming they're using the default 3.2 ClusterNetworkCIDR and not the 3.1 default or a custom value). This is already gross and it's not complete yet. And they've already got a list of all the bad endpoints repeated every 30 seconds in their logs...

@danwinship
Copy link
Contributor Author

noticing the "init containers" release note... maybe we need a upgrade migration/helper script?

@danwinship
Copy link
Contributor Author

for ep in $(oc get services --all-namespaces --template '{{ range .items}}{{ range .spec.selector }}{{ else }}{{ .metadata.namespace}}:{{ .metadata.name }} {{ end }}{{ end }}'); do
    oc get endpoints --namespace $(echo $ep | sed -e 's/:.*//') $(echo $ep | sed -e 's/.*://') --template '{{ .metadata.namespace }}:{{ .metadata.name }} {{ range .subsets }}{{ range .addresses }}{{ .ip }} {{ end }}{{ end }}{{ "\n" }}' | awk '/ 10\.(12[8-9]|1[3-9][0-9]|2[0-5][0-9])\./ { print $1 }'
done

@smarterclayton
Copy link
Contributor

LGTM [merge] - please remember to add the release note

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to aeee7c8

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/7684/)

@openshift-bot
Copy link
Contributor

openshift-bot commented Aug 9, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/7719/) (Image: devenv-rhel7_4797)

@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 9, 2016 via email

@danwinship
Copy link
Contributor Author

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to aeee7c8

@openshift-bot openshift-bot merged commit 1c1cb0b into openshift:master Aug 10, 2016
@danwinship danwinship deleted the drop-endpoint-filtering branch August 10, 2016 13:14
@bmeng
Copy link
Contributor

bmeng commented Aug 11, 2016

@danwinship Does this mean that the endpoint points to pod/svc in other project which created by an user with system:endpoint-controller role can be accessed from the user's project under multitenant plugin?

@danwinship
Copy link
Contributor Author

@danwinship Does this mean that the endpoint points to pod/svc in other project which created by an user with system:endpoint-controller role can be accessed from the user's project under multitenant plugin?

Yes, but this is not intentional/supported, and it might not work that way in the future if we manage to implement the idea in #9255 (comment). (If that happened, then the privileged user would still be able to create the endpoint, but trying to connect to the endpoint would fail, just like trying to connect to the other user's pod directly would.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants