Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Unable to fetch tags from ECR #3015

Closed
pbn4 opened this issue Apr 21, 2020 · 17 comments · Fixed by #3485
Closed

Unable to fetch tags from ECR #3015

pbn4 opened this issue Apr 21, 2020 · 17 comments · Fixed by #3485
Assignees
Labels
Milestone

Comments

@pbn4
Copy link

pbn4 commented Apr 21, 2020

Describe the bug

I have a fluxcd running inside EKS with Fargate. There is a workload for which I have automated update set up with repository being the AWS ECR. Roles for fargate pods give them permissions to access ECR.

Also none of the issues and solution I found in this repository are solving my problem, thus new thread.

To Reproduce

Set up an EKS cluster with fargate and flux and observe flux logs.

Expected behavior

Flux should update the workload with new image.

Logs

ts=2020-04-21T11:11:15.679624161Z caller=warming.go:180 component=warmer canonical_name=xxxx.dkr.ecr.eu-west-1.amazonaws.com/repo auth={map[]} err="requesting tags: Get https://xxxx.dkr.ecr.eu-west-1.amazonaws.com/v2/repo/tags/list: no basic auth credentials"

I did some experimentation with aws-cli docker image inside flux namespace. I was trying to get list of accounts e.g. configure list is yielding an empty configuration for this newly created pod and I'm not sure if this is correct. How can I check if arbitrary pod has the access to this URL flux is trying to fetch (I assume flux is specifying authentication somehow)?

Additional context

  • Flux version: 1.19
  • Kubernetes version: 1.15
  • Git provider: Github
  • Container registry provider: ECR
@pbn4 pbn4 added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Apr 21, 2020
@pbn4
Copy link
Author

pbn4 commented Apr 21, 2020

Some additional logs:

ts=2020-04-21T13:30:52.413063388Z caller=aws.go:124 component=aws error="fetching region for AWS" err="EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

I also gave all EC2 permissions to EKS cluster (not pods) and I believe this log started appearing:

ts=2020-04-21T13:30:52.413157744Z caller=aws.go:236 component=aws warning="AWS auth implied by ECR image, but AWS API is not available. You can ignore this if you are providing credentials some other way (e.g., through imagePullSecrets)" image=xxx.dkr.ecr.eu-west-1.amazonaws.com/repo err="EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

@admssa
Copy link

admssa commented Apr 27, 2020

ts=2020-04-27T18:16:34.152506667Z caller=warming.go:180 component=warmer canonical_name=... auth={map[]} err="requesting tags: Get https://.../tags/list: no basic auth credentials"

I'm having the same issue. (Access to metadata from pods was blocked to prevent using node roles by pods)
But it doesn't seem flux able to use aws web identity role and token. (EKS IRSA)

@pbn4 What about remounting new secret each renewing interval in this solution - #2708 (comment) ?

@pbn4
Copy link
Author

pbn4 commented Apr 27, 2020

@admssa The solution in the mentioned comment worked for me. Yup it seems that flux is not able to use AssumeWebIdentityRole on Fargate, although I'm no expert. What about remounting new secret each renewing interval in this solution I don't understand, could you rephrase the question?

@admssa
Copy link

admssa commented Apr 27, 2020

@pbn4 nevermind. I missed that fact that mounted Secrets are updated automatically :)

@renanqts
Copy link

renanqts commented Apr 29, 2020

I also have this issue. I was debugging a bit this issue and I think the problem is here:

ec2 := ec2metadata.New(sess)

starting at this line, Flux tries to get the instance metadata, but in our case, we don't have it in Fargate. With that, the variable "okToUseAWS" is never true.

@renanqts
Copy link

renanqts commented Apr 29, 2020

In my tests here, that is it. Works after force okToUseAWS=true.
We should think of another strategy for cases like that.

@evq
Copy link
Contributor

evq commented Jun 9, 2020

Also ran into this. Was trying to use flux with iam roles for service accounts and locked down access to ec2 metadata / instance profile credentials.

I modified the preflight code such that okToUseAWS=true is set if config.Regions is present - which resolved the issue.

@jclynny
Copy link

jclynny commented Jul 1, 2020

Is there any update on this? I'm hitting this as well with EKS trying to use the Node role.

@sureshamk
Copy link

Also ran into this. Was trying to use flux with iam roles for service accounts and locked down access to ec2 metadata / instance profile credentials.

I modified the preflight code such that okToUseAWS=true is set if config.Regions is present - which resolved the issue.

It worked for me

@jayvie
Copy link

jayvie commented Oct 6, 2020

anyone can help me how to solve it on my end. I am experiencing the same issue.
AWS EKS Fargate

image untagged.

Did try the same setup exactly but on AWS EKS EC2 and flux is working as expected.

@evq how can i do this and have it on my end as well?

I modified the preflight code such that okToUseAWS=true is set if config.Regions is present - which resolved the issue.

@kingdonb
Copy link
Member

This issue has been verified to affect not only AWS Fargate users, but a number of other cases as well. The fix I just merged will be published in Flux v1.23.2.

@kingdonb kingdonb removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Jul 27, 2021
@kingdonb kingdonb added this to the 1.23.2 milestone Jul 27, 2021
@kingdonb kingdonb reopened this Aug 5, 2021
@kingdonb kingdonb modified the milestones: 1.23.2, 1.23.3 Aug 5, 2021
@kingdonb
Copy link
Member

kingdonb commented Aug 5, 2021

Very sorry but an error was made in the 1.23.2 release preparation, and while the PR from #3485 was in master, it was not merged and included in the 1.23.x branch for 1.23.2. The CHANGELOG is in error about this.

I will have to make another release to include this change. It may take some time, meanwhile there is this image available from the official Flux Prerelease repo which includes this change for anyone that needs it:

fluxcd/flux-prerelease:master-32f9ab7d

I am very sorry about this, and will make a point to be more careful and attentive when I am executing releases in the future.

I've updated the release notes for v1.23.2 to reflect that an error was made.

@pierluigilenoci
Copy link

@kingdonb any ETA to have this change released?

@kingdonb
Copy link
Member

I will look into releasing it this week again if at all possible. I just got a clean security scan from Snyk for the first time in several weeks, so this will probably be Flux 1.24.0, a MINOR upgrade with Alpine 3.14.1 as the new base image. These changes should be bundled together so users with CVE scan requirements can upgrade readily without exceptions raised by security scans.

@kingdonb kingdonb removed this from the 1.23.3 milestone Aug 17, 2021
@kingdonb
Copy link
Member

@pierluigilenoci See #3537, all of the PRs for 1.24.0 are ready for review now.

I think it should be no problem to release 1.24.0 some time this week, but that will have to depend on and be subject to reviews from other maintainers, as I still cannot merge my own PRs without reviews from other maintainers.

I can cut a new prerelease image with all of those if you would like, or you can use one of the autogenerated flux-prerelease images for now if you are in need. Each time a new PR merges to master, it gets pushed automatically to fluxcd/flux-prerelease/tags on Docker Hub.

The latest pre-release image from this morning was:

fluxcd/flux-prerelease:master-c7a00046
DIGEST:sha256:f20d63cf37ce99acc797db1b42d1aca5b835535ffcd56aa4fd0fee365e5c6d4a

The fix for this issue, #3015, has already been merged in master and so is included in this pre-release image, has been for some time already. Please do give it a try if you can and let us know whether this fixes your issue.

It would be good to know for sure, or if it is a different issue fixed by #3485. If there is a different issue, there is still time to get it fixed and included for 1.24.0. 🙇 🙏

@pierluigilenoci
Copy link

Thank you @kingdonb,
I want to solve this problem but it's not that urgent.
I will wait for the release.

Thank you again 🙏🏻

@kingdonb
Copy link
Member

This issue is resolved in the 1.24.0 release that was just published.

The chart release will be 1.11.0 and is forthcoming in this PR:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.