-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended.[networking][router] openshift routers The HAProxy router should serve the correct routes when scoped to a single namespace and label set #12784
Comments
7 is apparently couldn't connect. Fixing... |
Keeping this open because the router pod can't connect to the endpoint. Either SDN or a bug in the router (maybe a race on initial setup). Need to dump logs. @knobuc I will add some debugging to the test |
For 12736 if we can verify that the router is not dropping events / racing / getting out of sync that would help here. The failure rate here is pretty high, and the nodes are going to be contended so any racy logic is likely to get stressed. |
Bumping priority since something is probably wrong in the code / environment vs just the test. |
I disabled this test from GCE to unblock getting GCE in the merge queue - to test, just remove the last segment from ADDITIONAL_SKIP when running a PR job (test_pull_requests_origin_gce can be triggered with a PR #) |
You are right it does look to be something in the environment. Event processing on the initial sync in the router is not be the issue here though - because the jenkins logs show the failure is at: So irrespective of the delayed binds (for the initial sync), the stats port should have been bound if the container and pods are up and that's failing here. And the jenkins logs do have: ( @danwinship any ideas on that one/why it would happen? thx) and
|
That's not related to OpenShift routes, and note that it set NetworkUnavailable to False. It's just openshift-sdn logging that it's undoing a kube-on-GCE/AWS bug that would otherwise completely break networking. |
@smarterclayton I'm now seeing this on my |
Going to print router pod logs on startup so we can see what the router tells us. |
@smarterclayton did you change it to print the pod logs? I didn't see anything in the artifacts... |
Looks like the router is hotlooping on not having access to services. It
should not be doing that.
On Feb 18, 2017, at 9:54 AM, Ben Parees <notifications@github.com> wrote:
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance_future/325/consoleFull#-169259343956c60d7be4b02b88ae8c268b
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12784 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p8B8KyHxo0W4-JeH0UPiAGNLrj_Tks5rdwY8gaJpZM4L1w_b>
.
|
I'll take a look at this - it does look like it is hot looping but it did process some endpoints/routes before it started doing that. Curious why the healthz endpoint fails as it did work in the reload script, so something else there. It would still fail the test afterwards but ... anyway, the logs have more clues now. |
So @rajatchopra and I looked at this - if the scoped router test is run by itself in a dind cluster - it should always fail since the 80/443/1936 ports are never opened up. @rajatchopra had a PR for that: #13036 but the other thing we found is that the network is unblocked in |
What do you mean, network is unblocked?
…On Tue, Feb 21, 2017 at 5:30 PM, Ram Ranganathan ***@***.***> wrote:
So @rajatchopra <https://github.com/rajatchopra> and I looked at this -
if the scoped router test is run by itself in a dind cluster - it should
always fail since the 80/443/1936 ports are never opened up. @rajatchopra
<https://github.com/rajatchopra> had a PR for that: #13036
<#13036>
but the other thing we found is that the network is unblocked in vendor/
k8s.io/kubernetes/test/e2e/framework/util.go - not sure if that has a
bearing or how it comes into play. But it could explain the test flakiness
if something did that in parallel/for another test.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12784 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p-ExTuTD_q7dqNpXHXP9D_McPhNmks5re2WfgaJpZM4L1w_b>
.
|
There's a test helper api call in util.go called Though that said, am not certain if that's the cause here. As mentioned above not sure if it has a bearing. Edited For clarity + assumptions. |
Ah, to ensure restore after the net split tests (which we don't run, yet)
On Feb 21, 2017, at 9:08 PM, Ram Ranganathan <notifications@github.com> wrote:
There's a test helper api call in util.go called UnblockNetwork
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/framework/util.go#L4393
which removes an iptables rule that by default rejects all connections made
to the host (except I think to port 22 and 10250).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12784 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pzchagYlAuHgbhxF2ED0D58Wc53Fks5re5iXgaJpZM4L1w_b>
.
|
Closing this issue out as the PR has merged. |
Still happening @knobunc @ramr @rajatchopra can someone dig through and understand why this is still failing? |
Running a test on overlay in https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_conformance_gce/68/console |
I think this was the fact that we were pulling images for the router during this test. If pull time > 2 min, failed. Fixed in #13388 |
I originally thought this was 11016, but it's not. The curl test is failing with exit code 7 (which we're not producing in our script), so it's possibly curl failing directly.
Occurs in gce tests: https://ci.openshift.redhat.com/jenkins/job/zz_origin_gce_image/94/
The text was updated successfully, but these errors were encountered: