Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended.[networking][router] openshift routers The HAProxy router should serve the correct routes when scoped to a single namespace and label set #12784

Closed
smarterclayton opened this issue Feb 2, 2017 · 28 comments · Fixed by #12786
Assignees
Labels
area/tests component/routing kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0
Milestone

Comments

@smarterclayton
Copy link
Contributor

I originally thought this was 11016, but it's not. The curl test is failing with exit code 7 (which we're not producing in our script), so it's possibly curl failing directly.

Occurs in gce tests: https://ci.openshift.redhat.com/jenkins/job/zz_origin_gce_image/94/

�[0m[networking][router] openshift routers�[0m �[90mThe HAProxy router�[0m 
  �[1mshould serve the correct routes when scoped to a single namespace and label set�[0m
  �[37m/data/src/github.com/openshift/origin/test/extended/router/scoped.go:77�[0m

[BeforeEach] [Top Level]
  /data/src/github.com/openshift/origin/test/extended/util/test.go:47
[BeforeEach] [networking][router] openshift routers
  /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Feb  2 18:02:04.244: INFO: >>> kubeConfig: /tmp/cluster-admin.kubeconfig

STEP: Building a namespace api object
Feb  2 18:02:04.389: INFO: configPath is now "/tmp/extended-test-scoped-router-7cr8j-zsp2n-user.kubeconfig"
Feb  2 18:02:04.389: INFO: The user is now "extended-test-scoped-router-7cr8j-zsp2n-user"
Feb  2 18:02:04.389: INFO: Creating project "extended-test-scoped-router-7cr8j-zsp2n"
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [networking][router] openshift routers
  /data/src/github.com/openshift/origin/test/extended/router/scoped.go:35
Feb  2 18:02:04.703: INFO: Running 'oc adm --config=/tmp/cluster-admin.kubeconfig --namespace=extended-test-scoped-router-7cr8j-zsp2n policy add-cluster-role-to-user system:router extended-test-scoped-router-7cr8j-zsp2n-user'
cluster role "system:router" added: "extended-test-scoped-router-7cr8j-zsp2n-user"
Feb  2 18:02:05.199: INFO: Running 'oc create --config=/tmp/extended-test-scoped-router-7cr8j-zsp2n-user.kubeconfig --namespace=extended-test-scoped-router-7cr8j-zsp2n -f /data/src/github.com/openshift/origin/test/extended/testdata/scoped-router.yaml'
pod "scoped-router" created
pod "router-override" created
rolebinding "system-router" created
route "route-1" created
route "route-2" created
service "endpoints" created
pod "endpoint-1" created
[It] should serve the correct routes when scoped to a single namespace and label set
  /data/src/github.com/openshift/origin/test/extended/router/scoped.go:77
Feb  2 18:02:05.806: INFO: Creating new exec pod
STEP: creating a scoped router from a config file "/data/src/github.com/openshift/origin/test/extended/testdata/scoped-router.yaml"
STEP: waiting for the healthz endpoint to respond
Feb  2 18:03:00.944: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://internal-api.pr94.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=extended-test-scoped-router-7cr8j-zsp2n execpod -- /bin/sh -c 
		set -e
		for i in $(seq 1 180); do
			code=$( curl -s -o /dev/null -w '%{http_code}\n' --header 'Host: 172.16.6.119' "http://172.16.6.119:1936/healthz" )
			echo $code
			if [[ $code -eq 200 ]]; then
				exit 0
			fi
			if [[ $code -ne 503 ]]; then
				exit 1
			fi
			sleep 1
		done
		'
Feb  2 18:03:02.484: INFO: rc: 127
[AfterEach] [networking][router] openshift routers
  /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:142
STEP: Collecting events from namespace "extended-test-scoped-router-7cr8j-zsp2n".
STEP: Found 19 events.
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:05 -0500 EST - event for endpoint-1: {default-scheduler } Scheduled: Successfully assigned endpoint-1 to ci-pr94-ig-n-g8zr
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:05 -0500 EST - event for execpod: {default-scheduler } Scheduled: Successfully assigned execpod to ci-pr94-ig-n-5wzp
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:05 -0500 EST - event for router-override: {default-scheduler } Scheduled: Successfully assigned router-override to ci-pr94-ig-n-lqhq
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:05 -0500 EST - event for scoped-router: {default-scheduler } Scheduled: Successfully assigned scoped-router to ci-pr94-ig-n-5wzp
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:06 -0500 EST - event for scoped-router: {kubelet ci-pr94-ig-n-5wzp} Pulling: pulling image "openshift/origin-haproxy-router"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:07 -0500 EST - event for endpoint-1: {kubelet ci-pr94-ig-n-g8zr} Pulling: pulling image "openshift/hello-openshift"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:07 -0500 EST - event for execpod: {kubelet ci-pr94-ig-n-5wzp} Created: Created container with docker id 7d3a4a08e4b2; Security:[seccomp=unconfined]
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:07 -0500 EST - event for execpod: {kubelet ci-pr94-ig-n-5wzp} Started: Started container with docker id 7d3a4a08e4b2
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:07 -0500 EST - event for execpod: {kubelet ci-pr94-ig-n-5wzp} Pulled: Container image "gcr.io/google_containers/hostexec:1.2" already present on machine
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:07 -0500 EST - event for router-override: {kubelet ci-pr94-ig-n-lqhq} Pulling: pulling image "openshift/origin-haproxy-router"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:09 -0500 EST - event for endpoint-1: {kubelet ci-pr94-ig-n-g8zr} Pulled: Successfully pulled image "openshift/hello-openshift"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:09 -0500 EST - event for endpoint-1: {kubelet ci-pr94-ig-n-g8zr} Started: Started container with docker id b485b0137549
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:09 -0500 EST - event for endpoint-1: {kubelet ci-pr94-ig-n-g8zr} Created: Created container with docker id b485b0137549; Security:[seccomp=unconfined]
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:09 -0500 EST - event for router-override: {kubelet ci-pr94-ig-n-lqhq} Pulled: Successfully pulled image "openshift/origin-haproxy-router"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:11 -0500 EST - event for router-override: {kubelet ci-pr94-ig-n-lqhq} Created: Created container with docker id 271e61435fba; Security:[seccomp=unconfined]
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:12 -0500 EST - event for router-override: {kubelet ci-pr94-ig-n-lqhq} Started: Started container with docker id 271e61435fba
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:57 -0500 EST - event for scoped-router: {kubelet ci-pr94-ig-n-5wzp} Pulled: Successfully pulled image "openshift/origin-haproxy-router"
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:59 -0500 EST - event for scoped-router: {kubelet ci-pr94-ig-n-5wzp} Created: Created container with docker id 32f401708f69; Security:[seccomp=unconfined]
Feb  2 18:03:02.556: INFO: At 2017-02-02 18:02:59 -0500 EST - event for scoped-router: {kubelet ci-pr94-ig-n-5wzp} Started: Started container with docker id 32f401708f69
Feb  2 18:03:02.626: INFO: POD                                                    NODE               PHASE    GRACE  CONDITIONS
Feb  2 18:03:02.626: INFO: docker-registry-3-jk35h                                ci-pr94-ig-m-3x76  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:32:13 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:32:17 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:32:13 -0500 EST  }]
Feb  2 18:03:02.626: INFO: registry-console-1-kp79m                               ci-pr94-ig-m-3x76  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:31:51 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:32:11 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:31:51 -0500 EST  }]
Feb  2 18:03:02.626: INFO: router-1-mw7hl                                         ci-pr94-ig-m-3x76  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:31:21 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:31:41 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 17:31:21 -0500 EST  }]
Feb  2 18:03:02.626: INFO: security-context-bf209494-e99b-11e6-8f5f-0ea7ca49eb50  ci-pr94-ig-n-lqhq  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:03:01 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:03:01 -0500 EST ContainersNotReady containers with unready status: [test-container]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:03:01 -0500 EST  }]
Feb  2 18:03:02.626: INFO: pet-0                                                  ci-pr94-ig-n-lqhq  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:59 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:59 -0500 EST ContainersNotReady containers with unready status: [nginx]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:59 -0500 EST  }]
Feb  2 18:03:02.626: INFO: history-limit-11-nmffm                                 ci-pr94-ig-n-5wzp  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:15 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:25 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:15 -0500 EST  }]
Feb  2 18:03:02.626: INFO: brokendeployment-1-deploy                              ci-pr94-ig-n-lqhq  Failed          [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:39 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:44 -0500 EST ContainersNotReady containers with unready status: [deployment]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:39 -0500 EST  }]
Feb  2 18:03:02.626: INFO: brokendeployment-1-hook-pre                            ci-pr94-ig-n-5wzp  Failed          [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:41 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:41 -0500 EST ContainersNotReady containers with unready status: [lifecycle]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:01:41 -0500 EST  }]
Feb  2 18:03:02.626: INFO: deployment-simple-1-pf6fg                              ci-pr94-ig-n-g8zr  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:25 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:34 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:25 -0500 EST  }]
Feb  2 18:03:02.626: INFO: deployment-simple-1-qcg3w                              ci-pr94-ig-n-5wzp  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:25 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:37 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:25 -0500 EST  }]
Feb  2 18:03:02.626: INFO: endpoint-1                                             ci-pr94-ig-n-g8zr  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:10 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  }]
Feb  2 18:03:02.626: INFO: execpod                                                ci-pr94-ig-n-5wzp  Running  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:08 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  }]
Feb  2 18:03:02.626: INFO: router-override                                        ci-pr94-ig-n-lqhq  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:13 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  }]
Feb  2 18:03:02.626: INFO: scoped-router                                          ci-pr94-ig-n-5wzp  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:03:00 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-02 18:02:05 -0500 EST  }]
Feb  2 18:03:02.626: INFO: 
Feb  2 18:03:02.657: INFO: 
Logging node info for node ci-pr94-ig-m-3x76
Feb  2 18:03:02.689: INFO: Node Info: &TypeMeta{Kind:,APIVersion:,}
Feb  2 18:03:02.689: INFO: 
Logging kubelet events for node ci-pr94-ig-m-3x76
Feb  2 18:03:02.719: INFO: 
Logging pods the kubelet thinks is on node ci-pr94-ig-m-3x76
Feb  2 18:03:02.787: INFO: registry-console-1-kp79m started at 2017-02-02 17:31:51 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:02.787: INFO: 	Container registry-console ready: true, restart count 0
Feb  2 18:03:02.787: INFO: router-1-mw7hl started at 2017-02-02 17:31:21 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:02.787: INFO: 	Container router ready: true, restart count 0
Feb  2 18:03:02.787: INFO: docker-registry-3-jk35h started at 2017-02-02 17:32:13 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:02.787: INFO: 	Container registry ready: true, restart count 0
W0202 18:03:02.824923   18042 metrics_grabber.go:73] Master node is not registered. Grabbing metrics from Scheduler and ControllerManager is disabled.
Feb  2 18:03:03.056: INFO: 
Latency metrics for node ci-pr94-ig-m-3x76
Feb  2 18:03:03.056: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:57.784151s}
Feb  2 18:03:03.056: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.9 Latency:47.980197s}
Feb  2 18:03:03.056: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:30.153065s}
Feb  2 18:03:03.056: INFO: 
Logging node info for node ci-pr94-ig-n-5wzp
Feb  2 18:03:03.087: INFO: Node Info: &TypeMeta{Kind:,APIVersion:,}
Feb  2 18:03:03.087: INFO: 
Logging kubelet events for node ci-pr94-ig-n-5wzp
Feb  2 18:03:03.118: INFO: 
Logging pods the kubelet thinks is on node ci-pr94-ig-n-5wzp
Feb  2 18:03:03.185: INFO: brokendeployment-1-hook-pre started at 2017-02-02 18:01:41 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.185: INFO: 	Container lifecycle ready: false, restart count 0
Feb  2 18:03:03.185: INFO: history-limit-11-nmffm started at 2017-02-02 18:02:15 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.185: INFO: 	Container myapp ready: true, restart count 0
Feb  2 18:03:03.185: INFO: scoped-router started at 2017-02-02 18:02:05 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.185: INFO: 	Container router ready: true, restart count 0
Feb  2 18:03:03.185: INFO: execpod started at 2017-02-02 18:02:05 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.185: INFO: 	Container hostexec ready: true, restart count 0
Feb  2 18:03:03.185: INFO: deployment-simple-1-qcg3w started at 2017-02-02 18:02:25 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.185: INFO: 	Container myapp ready: true, restart count 0
W0202 18:03:03.217689   18042 metrics_grabber.go:73] Master node is not registered. Grabbing metrics from Scheduler and ControllerManager is disabled.
Feb  2 18:03:03.479: INFO: 
Latency metrics for node ci-pr94-ig-n-5wzp
Feb  2 18:03:03.479: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m1.659214s}
Feb  2 18:03:03.479: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m1.659214s}
Feb  2 18:03:03.479: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m1.044005s}
Feb  2 18:03:03.479: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:1m57.241803s}
Feb  2 18:03:03.479: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.9 Latency:59.950659s}
Feb  2 18:03:03.479: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.99 Latency:50.7416s}
Feb  2 18:03:03.479: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:47.257539s}
Feb  2 18:03:03.479: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:31.146594s}
Feb  2 18:03:03.479: INFO: {Operation:createPodInfraContainer Method:container_manager_latency_microseconds Quantile:0.99 Latency:22.165174s}
Feb  2 18:03:03.479: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.9 Latency:21.582145s}
Feb  2 18:03:03.479: INFO: {Operation: Method:pleg_relist_latency_microseconds Quantile:0.99 Latency:20.505076s}
Feb  2 18:03:03.479: INFO: {Operation:runContainerInPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:18.926044s}
Feb  2 18:03:03.479: INFO: {Operation:create_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:13.916019s}
Feb  2 18:03:03.479: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.9 Latency:12.15898s}
Feb  2 18:03:03.479: INFO: {Operation: Method:pod_worker_start_latency_microseconds Quantile:0.99 Latency:11.874139s}
Feb  2 18:03:03.479: INFO: {Operation:logs Method:docker_operations_latency_microseconds Quantile:0.99 Latency:10.739455s}
Feb  2 18:03:03.479: INFO: 
Logging node info for node ci-pr94-ig-n-g8zr
Feb  2 18:03:03.510: INFO: Node Info: &TypeMeta{Kind:,APIVersion:,}
Feb  2 18:03:03.510: INFO: 
Logging kubelet events for node ci-pr94-ig-n-g8zr
Feb  2 18:03:03.541: INFO: 
Logging pods the kubelet thinks is on node ci-pr94-ig-n-g8zr
Feb  2 18:03:03.608: INFO: deployment-simple-1-pf6fg started at 2017-02-02 18:02:25 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.608: INFO: 	Container myapp ready: true, restart count 0
Feb  2 18:03:03.608: INFO: endpoint-1 started at 2017-02-02 18:02:05 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.608: INFO: 	Container test ready: true, restart count 0
W0202 18:03:03.640646   18042 metrics_grabber.go:73] Master node is not registered. Grabbing metrics from Scheduler and ControllerManager is disabled.
Feb  2 18:03:03.791: INFO: 
Latency metrics for node ci-pr94-ig-n-g8zr
Feb  2 18:03:03.791: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:3m25.981573s}
Feb  2 18:03:03.791: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.9 Latency:2m1.407968s}
Feb  2 18:03:03.791: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m1.250912s}
Feb  2 18:03:03.791: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m1.250912s}
Feb  2 18:03:03.791: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m0.000727s}
Feb  2 18:03:03.791: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m0.000727s}
Feb  2 18:03:03.791: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:1m13.654166s}
Feb  2 18:03:03.791: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:1m9.95981s}
Feb  2 18:03:03.791: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.99 Latency:1m2.363455s}
Feb  2 18:03:03.791: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:34.103758s}
Feb  2 18:03:03.791: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.9 Latency:31.054787s}
Feb  2 18:03:03.791: INFO: {Operation: Method:pleg_relist_latency_microseconds Quantile:0.99 Latency:27.632158s}
Feb  2 18:03:03.791: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.9 Latency:24.647182s}
Feb  2 18:03:03.791: INFO: {Operation:createPodInfraContainer Method:container_manager_latency_microseconds Quantile:0.99 Latency:20.153786s}
Feb  2 18:03:03.791: INFO: {Operation:runContainerInPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:19.87633s}
Feb  2 18:03:03.791: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.9 Latency:18.975451s}
Feb  2 18:03:03.791: INFO: {Operation: Method:pod_worker_start_latency_microseconds Quantile:0.99 Latency:17.728171s}
Feb  2 18:03:03.791: INFO: {Operation:GetPods Method:container_manager_latency_microseconds Quantile:0.99 Latency:15.177282s}
Feb  2 18:03:03.791: INFO: {Operation:create_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:13.729344s}
Feb  2 18:03:03.791: INFO: {Operation:start_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:13.605905s}
Feb  2 18:03:03.791: INFO: {Operation:list_containers Method:docker_operations_latency_microseconds Quantile:0.99 Latency:13.405765s}
Feb  2 18:03:03.791: INFO: {Operation: Method:pleg_relist_latency_microseconds Quantile:0.9 Latency:11.907627s}
Feb  2 18:03:03.791: INFO: {Operation:createPodInfraContainer Method:container_manager_latency_microseconds Quantile:0.9 Latency:11.43758s}
Feb  2 18:03:03.791: INFO: {Operation:runContainerInPod Method:container_manager_latency_microseconds Quantile:0.9 Latency:11.348516s}
Feb  2 18:03:03.791: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.5 Latency:10.614569s}
Feb  2 18:03:03.791: INFO: 
Logging node info for node ci-pr94-ig-n-lqhq
Feb  2 18:03:03.822: INFO: Node Info: &TypeMeta{Kind:,APIVersion:,}
Feb  2 18:03:03.822: INFO: 
Logging kubelet events for node ci-pr94-ig-n-lqhq
Feb  2 18:03:03.852: INFO: 
Logging pods the kubelet thinks is on node ci-pr94-ig-n-lqhq
Feb  2 18:03:03.919: INFO: router-override started at 2017-02-02 18:02:05 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.919: INFO: 	Container router ready: true, restart count 0
Feb  2 18:03:03.919: INFO: security-context-bf209494-e99b-11e6-8f5f-0ea7ca49eb50 started at 2017-02-02 18:03:01 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.919: INFO: 	Container test-container ready: false, restart count 0
Feb  2 18:03:03.919: INFO: brokendeployment-1-deploy started at 2017-02-02 18:01:39 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.919: INFO: 	Container deployment ready: false, restart count 0
Feb  2 18:03:03.919: INFO: pet-0 started at 2017-02-02 18:02:59 -0500 EST (0+1 container statuses recorded)
Feb  2 18:03:03.919: INFO: 	Container nginx ready: false, restart count 0
W0202 18:03:03.952914   18042 metrics_grabber.go:73] Master node is not registered. Grabbing metrics from Scheduler and ControllerManager is disabled.
Feb  2 18:03:04.151: INFO: 
Latency metrics for node ci-pr94-ig-n-lqhq
Feb  2 18:03:04.151: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:3m5.110311s}
Feb  2 18:03:04.152: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.9 Latency:1m14.269562s}
Feb  2 18:03:04.152: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.99 Latency:54.684942s}
Feb  2 18:03:04.152: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:53.803169s}
Feb  2 18:03:04.152: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:48.541955s}
Feb  2 18:03:04.152: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:48.541955s}
Feb  2 18:03:04.152: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:31.776467s}
Feb  2 18:03:04.152: INFO: {Operation:createPodInfraContainer Method:container_manager_latency_microseconds Quantile:0.99 Latency:27.651545s}
Feb  2 18:03:04.152: INFO: {Operation: Method:pleg_relist_latency_microseconds Quantile:0.99 Latency:26.967158s}
Feb  2 18:03:04.152: INFO: {Operation: Method:pod_worker_start_latency_microseconds Quantile:0.99 Latency:20.392968s}
Feb  2 18:03:04.152: INFO: {Operation:runContainerInPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:17.181112s}
Feb  2 18:03:04.152: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.9 Latency:16.603693s}
Feb  2 18:03:04.152: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:15.894454s}
Feb  2 18:03:04.152: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.99 Latency:15.025001s}
Feb  2 18:03:04.152: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.5 Latency:15.025001s}
Feb  2 18:03:04.152: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.9 Latency:15.025001s}
Feb  2 18:03:04.152: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.9 Latency:14.715421s}
Feb  2 18:03:04.152: INFO: {Operation:create_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:13.949317s}
STEP: Dumping a list of prepulled images on each node
Feb  2 18:03:04.182: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "extended-test-scoped-router-7cr8j-zsp2n" for this suite.
Feb  2 18:03:30.090: INFO: namespace: extended-test-scoped-router-7cr8j-zsp2n, resource: bindings, ignored listing per whitelist


�[91m�[1m• Failure [86.212 seconds]�[0m
[networking][router] openshift routers
�[90m/data/src/github.com/openshift/origin/test/extended/router/scoped.go:128�[0m
  The HAProxy router
  �[90m/data/src/github.com/openshift/origin/test/extended/router/scoped.go:127�[0m
    �[91m�[1mshould serve the correct routes when scoped to a single namespace and label set [It]�[0m
    �[90m/data/src/github.com/openshift/origin/test/extended/router/scoped.go:77�[0m

    �[91mExpected error:
        <*errors.errorString | 0xc821868890>: {
            s: "host command failed: error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl [kubectl --server=https://internal-api.pr94.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=extended-test-scoped-router-7cr8j-zsp2n execpod -- /bin/sh -c \n\t\tset -e\n\t\tfor i in $(seq 1 180); do\n\t\t\tcode=$( curl -s -o /dev/null -w '%{http_code}\\n' --header 'Host: 172.16.6.119' \"http://172.16.6.119:1936/healthz\" )\n\t\t\techo $code\n\t\t\tif [[ $code -eq 200 ]]; then\n\t\t\t\texit 0\n\t\t\tfi\n\t\t\tif [[ $code -ne 503 ]]; then\n\t\t\t\texit 1\n\t\t\tfi\n\t\t\tsleep 1\n\t\tdone\n\t\t] []  <nil>   [] <nil> 0xc8214284e0 exit status 7 <nil> true [0xc8203689a8 0xc8203689e8 0xc820368dd8] [0xc8203689a8 0xc8203689e8 0xc820368dd8] [0xc8203689c8 0xc820368a18] [0xaf59b0 0xaf59b0] 0xc820d8fc80}:\nCommand stdout:\n\nstderr:\n\nerror:\nexit status 7\n\n",
        }
        host command failed: error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl [kubectl --server=https://internal-api.pr94.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=extended-test-scoped-router-7cr8j-zsp2n execpod -- /bin/sh -c 
        		set -e
        		for i in $(seq 1 180); do
        			code=$( curl -s -o /dev/null -w '%{http_code}\n' --header 'Host: 172.16.6.119' "http://172.16.6.119:1936/healthz" )
        			echo $code
        			if [[ $code -eq 200 ]]; then
        				exit 0
        			fi
        			if [[ $code -ne 503 ]]; then
        				exit 1
        			fi
        			sleep 1
        		done
        		] []  <nil>   [] <nil> 0xc8214284e0 exit status 7 <nil> true [0xc8203689a8 0xc8203689e8 0xc820368dd8] [0xc8203689a8 0xc8203689e8 0xc820368dd8] [0xc8203689c8 0xc820368a18] [0xaf59b0 0xaf59b0] 0xc820d8fc80}:
        Command stdout:
        
        stderr:
        
        error:
        exit status 7
       
@smarterclayton smarterclayton added kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Feb 2, 2017
@smarterclayton
Copy link
Contributor Author

7 is apparently couldn't connect. Fixing...

@smarterclayton
Copy link
Contributor Author

Keeping this open because the router pod can't connect to the endpoint. Either SDN or a bug in the router (maybe a race on initial setup). Need to dump logs.

@knobuc I will add some debugging to the test

@smarterclayton
Copy link
Contributor Author

For 12736 if we can verify that the router is not dropping events / racing / getting out of sync that would help here. The failure rate here is pretty high, and the nodes are going to be contended so any racy logic is likely to get stressed.

@smarterclayton
Copy link
Contributor Author

Bumping priority since something is probably wrong in the code / environment vs just the test.

@smarterclayton smarterclayton added this to the 1.5.0 milestone Feb 3, 2017
@smarterclayton
Copy link
Contributor Author

I disabled this test from GCE to unblock getting GCE in the merge queue - to test, just remove the last segment from ADDITIONAL_SKIP when running a PR job (test_pull_requests_origin_gce can be triggered with a PR #)

@ramr
Copy link
Contributor

ramr commented Feb 6, 2017

You are right it does look to be something in the environment. Event processing on the initial sync in the router is not be the issue here though - because the jenkins logs show the failure is at:
/src/github.com/openshift/origin/test/extended/router/scoped.go:66
which is an error waiting for the healthz endpoint to respond.

So irrespective of the delayed binds (for the initial sync), the stats port should have been bound if the container and pods are up and that's failing here.

And the jenkins logs do have:
NetworkUnavailable False Mon, 01 Jan 0001 00:00:00 +0000 Thu, 02 Feb 2017 17:29:51 -0500 RouteCreated openshift-sdn cleared kubelet-set NoRouteCreated

( @danwinship any ideas on that one/why it would happen? thx)

and

39m 39m 1 {kubelet ci-pr94-ig-m-3x76} Warning ImageGCFailed unable to find data for container /

@danwinship
Copy link
Contributor

And the jenkins logs do have:
NetworkUnavailable False Mon, 01 Jan 0001 00:00:00 +0000 Thu, 02 Feb 2017 17:29:51 -0500 RouteCreated openshift-sdn cleared kubelet-set NoRouteCreated

( @danwinship any ideas on that one/why it would happen? thx)

That's not related to OpenShift routes, and note that it set NetworkUnavailable to False. It's just openshift-sdn logging that it's undoing a kube-on-GCE/AWS bug that would otherwise completely break networking.

@stevekuznetsov
Copy link
Contributor

@smarterclayton I'm now seeing this on my oct tests

@smarterclayton
Copy link
Contributor Author

Going to print router pod logs on startup so we can see what the router tells us.

@knobunc knobunc assigned ramr and unassigned knobunc Feb 14, 2017
@knobunc
Copy link
Contributor

knobunc commented Feb 14, 2017

@smarterclayton did you change it to print the pod logs? I didn't see anything in the artifacts...

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 19, 2017 via email

@ramr
Copy link
Contributor

ramr commented Feb 21, 2017

I'll take a look at this - it does look like it is hot looping but it did process some endpoints/routes before it started doing that. Curious why the healthz endpoint fails as it did work in the reload script, so something else there. It would still fail the test afterwards but ... anyway, the logs have more clues now.

@bparees bparees changed the title test flake: should serve the correct routes when scoped to a single namespace and label set Extended.[networking][router] openshift routers The HAProxy router should serve the correct routes when scoped to a single namespace and label set Feb 21, 2017
@ramr
Copy link
Contributor

ramr commented Feb 21, 2017

So @rajatchopra and I looked at this - if the scoped router test is run by itself in a dind cluster - it should always fail since the 80/443/1936 ports are never opened up. @rajatchopra had a PR for that: #13036

but the other thing we found is that the network is unblocked in vendor/k8s.io/kubernetes/test/e2e/framework/util.go - not sure if that has a bearing or how it comes into play. But it could explain the test flakiness if something did that in parallel/for another test.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 21, 2017 via email

@ramr
Copy link
Contributor

ramr commented Feb 22, 2017

There's a test helper api call in util.go called UnblockNetwork
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/framework/util.go#L4393
which adds/removes iptables rules to the node.
The dind default rejects all connections made to the host (except I think to port 22 and 10250).

Though that said, am not certain if that's the cause here. As mentioned above not sure if it has a bearing.

Edited For clarity + assumptions.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 22, 2017 via email

@ramr
Copy link
Contributor

ramr commented Feb 24, 2017

Closing this issue out as the PR has merged.

@smarterclayton
Copy link
Contributor Author

Still happening @knobunc @ramr @rajatchopra can someone dig through and understand why this is still failing?

@smarterclayton
Copy link
Contributor Author

@smarterclayton
Copy link
Contributor Author

I think this was the fact that we were pulling images for the router during this test. If pull time > 2 min, failed. Fixed in #13388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests component/routing kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants