Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: image cache/mysql:pullthrough not found #8399

Closed
mfojtik opened this issue Apr 7, 2016 · 23 comments
Closed

Error: image cache/mysql:pullthrough not found #8399

mfojtik opened this issue Apr 7, 2016 · 23 comments
Assignees
Labels
area/tests component/image kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@mfojtik
Copy link
Contributor

mfojtik commented Apr 7, 2016

Potential flake. Can be seen here: https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5534/console

Error: image cache/mysql:pullthrough not found
!!! Error in hack/../hack/../test/end-to-end/core.sh:129
    'docker pull ${DOCKER_REGISTRY}/cache/mysql:pullthrough' exited with status 1
Call stack:
    1: hack/../hack/../test/end-to-end/core.sh:129 main(...)
Exiting with status 1
@mfojtik mfojtik added kind/bug Categorizes issue or PR as related to a bug. priority/P2 area/tests component/image labels Apr 7, 2016
@soltysh
Copy link
Contributor

soltysh commented Apr 7, 2016

@miminar any idea why docker registry is falling to v1 registry, this is what I see in the logs:

time="2016-04-07T13:33:03.823468206Z" level=info msg="response completed" go.version=go1.6 http.request.host="172.30.16.220:5000" http.request.id=942210a3-4315-4106-9471-1bf66e4cc6af http.request.method=GET http.request.remoteaddr="172.18.6.120:60242" http.request.uri="/v2/cache/mysql/blobs/sha256:efd26ecc95486998b41b3fe167236e3fb3e109c66dd1a51ab5161e40b06cf486" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" http.response.contenttype="application/octet-stream" http.response.duration=1.921677219s http.response.status=200 http.response.written=51342828 instance.id=f79ccee9-263a-4bae-aeb8-77e72c26188f 
172.18.6.120 - - [07/Apr/2016:13:33:01 +0000] "GET /v2/cache/mysql/blobs/sha256:efd26ecc95486998b41b3fe167236e3fb3e109c66dd1a51ab5161e40b06cf486 HTTP/1.1" 200 51342828 "" "docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64"
time="2016-04-07T13:33:03.925080703Z" level=info msg="response completed" go.version=go1.6 http.request.host="172.30.16.220:5000" http.request.id=2fecfaf9-1878-4588-9710-f3ec32ee52ba http.request.method=GET http.request.remoteaddr="172.18.6.120:60243" http.request.uri="/v2/cache/mysql/blobs/sha256:87c7daf4abfabeef18ae19cd661840282d149989c039cdd2f426535e60e0e703" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" http.response.contenttype="application/octet-stream" http.response.duration=1.976474849s http.response.status=200 http.response.written=63951640 instance.id=f79ccee9-263a-4bae-aeb8-77e72c26188f 
172.18.6.120 - - [07/Apr/2016:13:33:01 +0000] "GET /v2/cache/mysql/blobs/sha256:87c7daf4abfabeef18ae19cd661840282d149989c039cdd2f426535e60e0e703 HTTP/1.1" 200 63951640 "" "docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64"
172.17.0.1 - - [07/Apr/2016:13:33:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1"
172.17.0.1 - - [07/Apr/2016:13:33:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1"
172.18.6.120 - - [07/Apr/2016:13:33:13 +0000] "GET /v1/repositories/cache/mysql/images HTTP/1.1" 404 19 "" "docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64"

It's downloading the image just fine, but suddenly it reaches for v1 registry, any ideas?

@soltysh
Copy link
Contributor

soltysh commented Apr 7, 2016

I have the full docker registry log, if needed.

@smarterclayton
Copy link
Contributor

If this is a flake that will affect users in production it's p0 (until we prove otherwise).

@soltysh
Copy link
Contributor

soltysh commented Apr 12, 2016

After several investigations the problem I've managed to identify is this error:

Error getting ImageStreamTag "pullthrough": imagestreamtags "mysql:pullthrough" not found

The #8461 wrapped image-import and docker pull commands into os::cmd::expect_success, which should either fix the problem or give us some more info. I'm especially hoping to get stdout and stderr from the image-import command which might fail for some unknown reason leading to failing docker pull.
For now I'm advising to lower the priority of the bug, since it's not a permanent issue but rather a interminent problem.

@liggitt
Copy link
Contributor

liggitt commented Apr 13, 2016

@soltysh
Copy link
Contributor

soltysh commented Apr 13, 2016

This looks like a different error:

time="2016-04-13T04:42:29.034858911Z" level=error msg="response completed with error" err.code="BLOB_UNKNOWN" err.detail=sha256:d347a60f2547ec23a95f6a82a7d86bfc3175111f4f730daf0b09e4d3adc875f3 err.message="blob unknown to registry" go.version=go1.6 http.request.host="172.30.25.55:5000" http.request.id=f5b9eb63-a4c3-4c22-af71-1f74d953f831 http.request.method=HEAD http.request.remoteaddr="172.18.15.86:42491" http.request.uri="/v2/cache/mysql/blobs/sha256:d347a60f2547ec23a95f6a82a7d86bfc3175111f4f730daf0b09e4d3adc875f3" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" http.response.contenttype="application/json; charset=utf-8" http.response.duration=595.411786ms http.response.status=404 http.response.written=157 instance.id=be1a0e30-a5ce-41bd-8e98-682ed6469885 vars.digest="sha256:d347a60f2547ec23a95f6a82a7d86bfc3175111f4f730daf0b09e4d3adc875f3" vars.name="cache/mysql" 

Looking into it.

@miminar
Copy link

miminar commented Apr 15, 2016

@soltysh
Copy link
Contributor

soltysh commented Apr 18, 2016

@miminar good catch, now that we have the import output this is what we got:
import failed: you may not have access to the Docker image "mysql:latest
We're currently being hit by this problem in two places:

  1. at import time
  2. at pull time

To nail the problem we'd have to log entire traffic between us and docker registry. It looks like docker registry is sometimes blocking us, requiring auth (?), because I can't seem to find any other reasonable explanation. Especially that the image we're trying to get is public, even more docker-provided. @smarterclayton @pweil- @miminar thoughts?

@soltysh
Copy link
Contributor

soltysh commented Apr 18, 2016

For no 1. we could modify our tests in such a way that if it detects unauthorized error it'll retry, no idea how to address no 2. without code modifications, yet.

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 18, 2016 via email

@smarterclayton
Copy link
Contributor

Can we get more tracing in those cases? Possibly the entire response body
and headers so we can compare?

On Apr 18, 2016, at 5:23 AM, Maciej Szulik notifications@github.com wrote:

For no 1. we could modify our tests in such a way that if it detects
unauthorized error it'll retry, no idea how to address no 2. without code
modifications, yet.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8399 (comment)

@soltysh
Copy link
Contributor

soltysh commented Apr 18, 2016

That's what I was thinking about, but not sure if it's possible post-mortem. I'll dig into it.

@smarterclayton smarterclayton added the kind/test-flake Categorizes issue or PR as related to test flakes. label Apr 18, 2016
@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 18, 2016

This may be related - saw this in against our registry

STEP: dockercfg with svrAddr 172.30.165.70:5000 user serviceaccount pass eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJleHRlbmRlZC10ZXN0LWZvcmNlcHVsbC1oYmtjeC00djh2aSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJidWlsZGVyLXRva2VuLXRzbzlqIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImJ1aWxkZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIwNDA4NWFlOS0wNWE4LTExZTYtODMxNS0wZWE5NmMxYTA0ZjMiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6ZXh0ZW5kZWQtdGVzdC1mb3JjZXB1bGwtaGJrY3gtNHY4dmk6YnVpbGRlciJ9.tY0PwtEFs9d2CyNioUeq9op6twd-9_mxZK2xvaRFTTOrY0FhKTPrm6NwqXM1gvgTIM24jzowR6SWAEWgstDJYjRNEX-rirBG7EBqkIzskYQ5RWaN_pe71JHCZluJm_KGgf2QKxlrDZQN-AxcJGLge35-Yiw3wkZKmvJMB56jJzzPSDtg6o8jciPvjKSoD6J0zhpzR5FcYMwqyqzF2RevMJs4QnRvijvmHickXxR3OjXiP3nLUWF-dVsWb64ZcRgH2bnMotSCQlj2EcsFPlhmlxurfF9N91OK5aiLxhGXjxsyXLe8cQz-2rlAY007JWbrl0UEGtXGY6ep6KegN5KwPQ email serviceaccount@example.org 
STEP: Collecting events from namespace "extended-test-forcepull-hbkcx-4v8vi".
Apr 18 16:59:49.453: INFO: At 2016-04-18 16:56:57 -0400 EDT - event for forcepull-bldr-1-build: {default-scheduler } Scheduled: Successfully assigned forcepull-bldr-1-build to 172.18.2.57
Apr 18 16:59:49.453: INFO: At 2016-04-18 16:57:00 -0400 EDT - event for forcepull-bldr-1-build: {kubelet 172.18.2.57} Pulled: Container image "openshift/origin-docker-builder:11af88f" already present on machine
Apr 18 16:59:49.453: INFO: At 2016-04-18 16:57:01 -0400 EDT - event for forcepull-bldr-1-build: {kubelet 172.18.2.57} Created: Created container with docker id 81a3ea79eb24
Apr 18 16:59:49.453: INFO: At 2016-04-18 16:57:02 -0400 EDT - event for forcepull-bldr-1-build: {kubelet 172.18.2.57} Started: Started container with docker id 81a3ea79eb24
Apr 18 16:59:49.472: INFO: POD                      NODE         PHASE      GRACE  CONDITIONS
Apr 18 16:59:49.472: INFO: docker-registry-1-0dmbo  172.18.2.57  Running           [{Ready True 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:39:51 -0400 EDT  }]
Apr 18 16:59:49.472: INFO: router-1-deploy          172.18.2.57  Failed            []
Apr 18 16:59:49.472: INFO: router-2-rdhzw           172.18.2.57  Running           [{Ready True 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:40:20 -0400 EDT  }]
Apr 18 16:59:49.472: INFO: scale-up-igonv           172.18.2.57  Running    30s    [{Ready True 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:59:16 -0400 EDT  }]
Apr 18 16:59:49.472: INFO: forcepull-bldr-1-build   172.18.2.57  Failed            [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:59:47 -0400 EDT ContainersNotReady containers with unready status: [docker-build]}]
Apr 18 16:59:49.472: INFO: src-1-build              172.18.2.57  Succeeded         [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:58:23 -0400 EDT PodCompleted }]
Apr 18 16:59:49.472: INFO: src-2-build              172.18.2.57  Succeeded         [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:58:51 -0400 EDT PodCompleted }]
Apr 18 16:59:49.472: INFO: src-3-build              172.18.2.57  Succeeded         [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:59:16 -0400 EDT PodCompleted }]
Apr 18 16:59:49.472: INFO: src-4-build              172.18.2.57  Succeeded         [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:59:32 -0400 EDT PodCompleted }]
Apr 18 16:59:49.472: INFO: src-5-build              172.18.2.57  Succeeded         [{Ready False 0001-01-01 00:00:00 +0000 UTC 2016-04-18 16:59:47 -0400 EDT PodCompleted }]
Apr 18 16:59:49.472: INFO: 
Apr 18 16:59:49.480: INFO: 
Logging node info for node 172.18.2.57
Apr 18 16:59:49.488: INFO: Node Info: &{{ } {172.18.2.57   /api/v1/nodes/172.18.2.57 a22526c7-05a5-11e6-8315-0ea96c1a04f3 12989 0 2016-04-18 16:39:22 -0400 EDT <nil> <nil> map[kubernetes.io/hostname:172.18.2.57] map[]} { 172.18.2.57  false} {map[cpu:{2.000 DecimalSI} memory:{7934193664.000 BinarySI} pods:{110.000 DecimalSI}] map[pods:{110.000 DecimalSI} cpu:{2.000 DecimalSI} memory:{7934193664.000 BinarySI}]  [{OutOfDisk False 2016-04-18 16:59:44 -0400 EDT 2016-04-18 16:39:22 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {Ready True 2016-04-18 16:59:44 -0400 EDT 2016-04-18 16:39:33 -0400 EDT KubeletReady kubelet is posting ready status}] [{LegacyHostIP 172.18.2.57} {InternalIP 172.18.2.57}] {{10250}} {f9370ed252a14f73b014c1301a9b6d1b EC22CE7C-CCE4-AF2E-3176-6F71E6F9D19E 0f561e5b-3aac-4c55-828c-2164576e221d 3.10.0-229.7.2.el7.x86_64 Red Hat Enterprise Linux Server 7.2 (Maipo) docker://1.9.1 v1.2.0-36-g4a3f9c5 v1.2.0-36-g4a3f9c5} [{[<none>:<none>] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-hom8f-shared/src:4] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-hom8f-shared/src:3] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-hom8f-shared/src:2] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-hom8f-shared/src:1] 100} {[172.30.165.70:5000/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder:latest] 443310209} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/sized:foo] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/new:refused] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/other:refused] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/other:second] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/other:third] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/sized:second] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/sized:first] 100} {[172.30.165.70:5000/extended-test-imagequota-admission-r6349-1dzdb/sized:refused] 100} {[openshift/deployment-example:v2] 7585584} {[openshift/deployment-example:v1] 7585584} {[openshift/node:11af88f openshift/node:latest] 636119156} {[openshift/origin-f5-router:11af88f openshift/origin-f5-router:latest] 543735769} {[openshift/origin-sti-builder:11af88f openshift/origin-sti-builder:latest] 543735769} {[openshift/origin-gitserver:11af88f openshift/origin-gitserver:latest] 581438641} {[openshift/origin-docker-builder:11af88f openshift/origin-docker-builder:latest] 543735769} {[openshift/origin-recycler:11af88f openshift/origin-recycler:latest] 34975664} {[openshift/origin-deployer:11af88f openshift/origin-deployer:latest] 543735769} {[openshift/origin-docker-registry:11af88f openshift/origin-docker-registry:latest] 341697503} {[openshift/origin-keepalived-ipfailover:11af88f openshift/origin-keepalived-ipfailover:latest] 437550234} {[openshift/origin-haproxy-router:11af88f openshift/origin-haproxy-router:latest] 431806667} {[openshift/origin:11af88f openshift/origin:latest] 543735769} {[<none>:<none>] 7585584} {[<none>:<none>] 7585584} {[openshift/hello-openshift:latest openshift/hello-openshift:11af88f openshift/hello-openshift:c3ada66] 7584592} {[openshift/origin-custom-docker-builder:11af88f openshift/origin-custom-docker-builder:c3ada66 openshift/origin-custom-docker-builder:latest] 402543189} {[openshift/node:c3ada66] 636119245} {[openshift/origin-f5-router:c3ada66] 543735769} {[openshift/origin-sti-builder:c3ada66] 543735769} {[openshift/origin-gitserver:c3ada66] 581438641} {[openshift/origin-docker-builder:c3ada66] 543735769} {[openshift/origin-recycler:c3ada66] 34975664} {[openshift/origin-deployer:c3ada66] 543735769} {[openshift/origin-egress-router:latest openshift/origin-egress-router:11af88f openshift/origin-egress-router:c3ada66] 298249834} {[openshift/origin-docker-registry:c3ada66] 341697503} {[openshift/origin-keepalived-ipfailover:c3ada66] 437550234} {[openshift/origin-haproxy-router:c3ada66] 431806667} {[openshift/origin:c3ada66] 543735769} {[openshift/openvswitch:11af88f openshift/openvswitch:c3ada66 openshift/openvswitch:latest] 228138943} {[openshift/origin-pod:11af88f openshift/origin-pod:c3ada66 openshift/origin-pod:latest] 1590312} {[rhel7.2:latest] 256294412} {[rhel7:latest rhel7.1:latest] 753998457} {[docker.io/openshift/origin-base:latest] 279579166} {[openshift/origin-release:latest] 711006224} {[openshift/origin-haproxy-router-base:latest] 298871742} {[openshift/origin-base:latest] 279586407} {[docker.io/openshift/hello-openshift:latest] 7584592} {[docker.io/openshift/origin-haproxy-router:latest] 431807192} {[docker.io/openshift/origin-pod:latest] 1590312} {[] 539322259} {[docker.io/centos/ruby-22-centos7:latest] 443849262} {[docker.io/openshift/ruby-20-centos7:latest] 443310209} {[docker.io/nginx:latest] 182610738} {[docker.io/perl:latest] 653322790} {[docker.io/centos:centos7] 196677849} {[registry.access.redhat.com/rhel7.2:latest] 203268847} {[docker.io/busybox:latest] 1112820} {[docker.io/redis:latest] 177537508} {[gcr.io/google_containers/nettest:1.7] 24051275} {[gcr.io/google_containers/netexec:1.4] 7293444} {[gcr.io/google_containers/busybox:1.24] 1113554} {[gcr.io/google_containers/mounttest:0.6] 2084693} {[gcr.io/google_containers/hostexec:1.2] 13185747} {[gcr.io/google_containers/volume-ceph:0.1] 675657560} {[gcr.io/google_containers/mounttest-user:0.3] 1718853} {[registry.access.redhat.com/rhel7.1:latest] 158229901} {[gcr.io/google_containers/mounttest:0.5] 1718853} {[gcr.io/google_containers/pause:2.0] 350164} {[gcr.io/google_containers/porter:cd5cb5791ebaa8641955f0e8c2a9bed669b1eaab] 5010921} {[gcr.io/google_containers/portforwardtester:1.0] 2296329} {[gcr.io/google_containers/volume-iscsi:0.1] 244433376} {[docker.io/openshift/deployment-example:v1] 5769840} {[gcr.io/google_containers/eptest:0.1] 2970692} {[gcr.io/google_containers/serve_hostname:1.1] 4522409} {[gcr.io/google_containers/update-demo:nautilus] 4555533} {[gcr.io/google_containers/nginx:1.7.9] 91641000} {[gcr.io/google_containers/test-webserver:e2e] 4534272} {[gcr.io/google_containers/busybox:latest] 2429728} {[gcr.io/google_containers/liveness:e2e] 4387474}]}}
Apr 18 16:59:49.488: INFO: 
Logging kubelet events for node 172.18.2.57
Apr 18 16:59:49.492: INFO: 
Logging pods the kubelet thinks is on node 172.18.2.57
Apr 18 16:59:49.510: INFO: docker-registry-1-0dmbo started at <nil> (0 container statuses recorded)
Apr 18 16:59:49.510: INFO: scale-up-igonv started at <nil> (0 container statuses recorded)
Apr 18 16:59:49.510: INFO: router-2-rdhzw started at <nil> (0 container statuses recorded)
Apr 18 16:59:49.967: INFO: ERROR kubelet_docker_errors{operation_type="inspect_container"} => 51 @[0]
Apr 18 16:59:49.967: INFO: ERROR kubelet_docker_errors{operation_type="inspect_image"} => 27 @[0]
Apr 18 16:59:49.967: INFO: ERROR kubelet_docker_errors{operation_type="logs"} => 1 @[0]
Apr 18 16:59:49.967: INFO: ERROR kubelet_docker_errors{operation_type="start_container"} => 7 @[0]
Apr 18 16:59:49.967: INFO: ERROR kubelet_docker_errors{operation_type="stop_container"} => 122 @[0]
Apr 18 16:59:49.967: INFO: 
Latency metrics for node 172.18.2.57
Apr 18 16:59:49.967: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:52.893417s}
Apr 18 16:59:49.967: INFO: {Operation:update Method:pod_worker_latency_microseconds Quantile:0.99 Latency:35.010402s}
Apr 18 16:59:49.967: INFO: {Operation:update Method:pod_worker_latency_microseconds Quantile:0.9 Latency:31.860651s}
Apr 18 16:59:49.967: INFO: {Operation:stop_container Method:docker_operations_latency_microseconds Quantile:0.99 Latency:30.620567s}
Apr 18 16:59:49.967: INFO: {Operation:create Method:pod_worker_latency_microseconds Quantile:0.99 Latency:29.907848s}
Apr 18 16:59:49.967: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.9 Latency:27.24821s}
Apr 18 16:59:49.967: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.99 Latency:24.684774s}
Apr 18 16:59:49.967: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:23.465448s}
Apr 18 16:59:49.967: INFO: {Operation:SyncPod Method:container_manager_latency_microseconds Quantile:0.99 Latency:18.697897s}
Apr 18 16:59:49.967: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.9 Latency:15.649023s}
Apr 18 16:59:49.967: INFO: Unknown output type: . Skipping.
Apr 18 16:59:49.967: INFO: Waiting up to 1m0s for all nodes to be ready
STEP: Destroying namespace "extended-test-forcepull-hbkcx-4v8vi" for this suite.


• Failure in Spec Setup (BeforeEach) [208.877 seconds]
[LocalNode][builds] forcePull should affect pulling builder images
/data/src/github.com/openshift/origin/test/extended/builds/forcepull.go:166
  ForcePull test context   [BeforeEach]
  /data/src/github.com/openshift/origin/test/extended/builds/forcepull.go:164
    ForcePull test case execution
    /data/src/github.com/openshift/origin/test/extended/builds/forcepull.go:162

    Expected error:
        <*errors.errorString | 0xc208d39d10>: {
            s: "Error: image extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder:latest not found",
        }
        Error: image extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder:latest not found
    not to have occurred

    /data/src/github.com/openshift/origin/test/extended/builds/forcepull.go:108
------------------------------

Which correlates to

time="2016-04-18T20:59:49.343979912Z" level=debug msg="authorizing request" go.version=go1.6 http.request.host="172.30.165.70:5000" http.request.id=136f9565-33ec-484c-9625-7c5cd64ba3e5 http.request.method=GET http.request.remoteaddr="172.18.2.57:44975" http.request.uri="/v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" instance.id=347e5f58-43de-4ef6-af03-9ff2612525c2 vars.name="extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder" vars.reference=latest 
time="2016-04-18T20:59:49.344662006Z" level=debug msg="Origin auth: checking for access to repository:extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder:pull" go.version=go1.6 http.request.host="172.30.165.70:5000" http.request.id=136f9565-33ec-484c-9625-7c5cd64ba3e5 http.request.method=GET http.request.remoteaddr="172.18.2.57:44975" http.request.uri="/v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" instance.id=347e5f58-43de-4ef6-af03-9ff2612525c2 vars.name="extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder" vars.reference=latest 
time="2016-04-18T20:59:49.362894642Z" level=debug msg=GetImageManifest go.version=go1.6 http.request.host="172.30.165.70:5000" http.request.id=136f9565-33ec-484c-9625-7c5cd64ba3e5 http.request.method=GET http.request.remoteaddr="172.18.2.57:44975" http.request.uri="/v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" instance.id=347e5f58-43de-4ef6-af03-9ff2612525c2 vars.name="extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder" vars.reference=latest 
172.18.2.57 - - [18/Apr/2016:20:59:49 +0000] "GET /v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest HTTP/1.1" 404 339 "" "docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64"
time="2016-04-18T20:59:49.393081944Z" level=error msg="Error getting ImageStreamTag \"latest\": imagestreams \"forcepull-extended-test-builder\" not found" go.version=go1.6 http.request.host="172.30.165.70:5000" http.request.id=136f9565-33ec-484c-9625-7c5cd64ba3e5 http.request.method=GET http.request.remoteaddr="172.18.2.57:44975" http.request.uri="/v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" instance.id=347e5f58-43de-4ef6-af03-9ff2612525c2 vars.name="extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder" vars.reference=latest 
time="2016-04-18T20:59:49.393259611Z" level=error msg="response completed with error" err.code="MANIFEST_UNKNOWN" err.detail="imagestreams \"forcepull-extended-test-builder\" not found" err.message="manifest unknown" go.version=go1.6 http.request.host="172.30.165.70:5000" http.request.id=136f9565-33ec-484c-9625-7c5cd64ba3e5 http.request.method=GET http.request.remoteaddr="172.18.2.57:44975" http.request.uri="/v2/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/manifests/latest" http.request.useragent="docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64" http.response.contenttype="application/json; charset=utf-8" http.response.duration=51.276079ms http.response.status=404 http.response.written=339 instance.id=347e5f58-43de-4ef6-af03-9ff2612525c2 vars.name="extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder" vars.reference=latest 
172.18.2.57 - - [18/Apr/2016:20:59:49 +0000] "GET /v1/repositories/extended-test-forcepull-hbkcx-4v8vi/forcepull-extended-test-builder/images HTTP/1.1" 404 19 "" "docker/1.9.1 go/go1.4.2 kernel/3.10.0-229.7.2.el7.x86_64 os/linux arch/amd64"

@smarterclayton
Copy link
Contributor

This kind of looks like a race condition.

@smarterclayton
Copy link
Contributor

I think this is fixed in #8558, the docker hub occasionally returns 401 on import. Need to clean the fix up and make it even more targeted (discuss whether to preserve this for DTR, which I bet has this problem as well).

@soltysh
Copy link
Contributor

soltysh commented Apr 27, 2016

I'm reopening for the time of debugging the failure in https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_integration/301/consoleFull

@soltysh soltysh reopened this Apr 27, 2016
@soltysh
Copy link
Contributor

soltysh commented Apr 27, 2016

@smarterclayton after careful investigation the only two operations we do not retry are store.Open and io.CopyN from pullthroughBlobStore. The logs evidently shows the io.CopyN is the culprit in the output you've pointed me to, see:

time="2016-04-25T15:36:57.066474262Z" level=info msg="Copying 217 bytes of type \"application/octet-stream\" for \"sha256:7e224052b3b1fc91443295519f43edf302997496c13dd970c122f189485e0178\"" 
time="2016-04-25T15:36:57.090276777Z" level=error msg="Failed copying content from remote store \"sha256:7e224052b3b1fc91443295519f43edf302997496c13dd970c122f189485e0178\": unauthorized: authentication required"

The question is, should we retry, since apparently this is hitting the issue the great @liggitt identified, about hitting a clock-skewed server.

@smarterclayton
Copy link
Contributor

What backs the input stream that is returned by open?

On Apr 27, 2016, at 6:14 AM, Maciej Szulik notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton after careful
investigation the only two operations we do not retry are store.Open

remoteReader, err := store.Open(ctx, dgst)

and io.CopyN
if _, err := io.CopyN(w, remoteReader, desc.Size); err != nil {

from pullthroughBlobStore. The logs evidently shows the io.CopyN is the
culprit in the output you've pointed me to, see:

time="2016-04-25T15:36:57.066474262Z" level=info msg="Copying 217
bytes of type "application/octet-stream" for
"sha256:7e224052b3b1fc91443295519f43edf302997496c13dd970c122f189485e0178""
time="2016-04-25T15:36:57.090276777Z" level=error msg="Failed copying
content from remote store
"sha256:7e224052b3b1fc91443295519f43edf302997496c13dd970c122f189485e0178":
unauthorized: authentication required"

The question is, should we retry, since apparently this is hitting the
issue the great @liggitt https://github.com/liggitt identified, about
hitting a clock-skewed server.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8399 (comment)

@soltysh
Copy link
Contributor

soltysh commented Apr 27, 2016

Per this comment I'm proposing to wait a bit and see what happens and how fast their fix will be actually live. Additionally talking to @smarterclayton on IRC he's not too much comfortable with us retrying reading bytes.

@miminar miminar mentioned this issue Apr 28, 2016
3 tasks
@soltysh
Copy link
Contributor

soltysh commented Apr 28, 2016

The same problem

time="2016-04-28T08:01:16.159905939Z" level=error msg="Failed copying content from remote store \"sha256:b216856605521bf39a8784690d432d80d9a6ee20323d6c6d9972715f825676f7\": unauthorized: authentication required"

appeared in https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_integration/450/ (the linked PR).

@smarterclayton
Copy link
Contributor

I think this is resolved.

@soltysh
Copy link
Contributor

soltysh commented Jun 1, 2016

I wouldn't be that sure, esp. that the issue is silent for past month... I'd love to be that way...

@smarterclayton
Copy link
Contributor

I added retries to a few other places - I think this is fixed upstream, because I haven't seen retries in the logs when I've been looking at other issues.

soltysh added a commit to soltysh/origin that referenced this issue Oct 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests component/image kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

No branches or pull requests

6 participants