Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to access any AWS Resources from a pod with security group #1796

Closed
ghost opened this issue Dec 18, 2021 · 11 comments
Closed

Unable to access any AWS Resources from a pod with security group #1796

ghost opened this issue Dec 18, 2021 · 11 comments

Comments

@ghost
Copy link

ghost commented Dec 18, 2021

What happened:
I created a cluster where a pod should read/write data from/to RDS and S3. In order to make the connection secure, I added IRSA for S3 and RDS. An additional layer of security was added by creating a security group for the pod so that it can talk to RDS. However after doing this, while the pod can write to RDS and S3 without any issues, pod can read only from RDS and not from S3. I exec'd into the pod to see what was happening. When I execute aws s3 ls and aws sts get-caller-identity. I get Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/" as output.

In order to implement security groups for pods, I followed https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html. I understand that when security group is applied to a pod, source NAT is disabled so I created a VPC endpoint for S3 (Gateway Endpoint). I created an outbound rule in the pod's security group to access managed prefix list for S3 as well. I followed instructions on Managing Amazon S3 access with VPC endpoints and S3 Access Points for this. This didn't help with execution of the commands that I showed earlier.

I also created an Interface VPC Endpoint for STS but that didn't work either.

I have referred to #1211 as well. I am already following the instructions mentioned in this post as the dns resolution is active for my cluster.

Can anyone please help?

Environment:

  • Kubernetes version (use kubectl version): Client - 1.21; Server - 1.21
  • CNI Version 1.9
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): Linux ip-172-31-23-63.us-west-2.compute.internal 4.14.252-195.483.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
@ghost
Copy link
Author

ghost commented Dec 23, 2021

@cgchinmay Any luck?

@cgchinmay
Copy link
Contributor

@suryakiran1006 haven't got a chance to try it out. Will be able to get you an update by next week.

@cgchinmay
Copy link
Contributor

@suryakiran1006 I launched a node in private subnet and then launched a pod with security group enabled on that node. I was able to use aws s3 ls and aws sts get-caller-identity. To access internet, I had NAT gateway in this private subnet's route table as pointed in the docs

Source NAT is disabled for outbound traffic from pods with assigned security groups so that outbound security group rules are applied. To access the internet, pods with assigned security groups must be launched on nodes that are deployed in a private subnet configured with a NAT gateway or instance. Pods with assigned security groups deployed to public subnets are not able to access the internet.

Were you able to access s3 with just above steps ?

Let's debug only this part before going here: Managing Amazon S3 access with VPC endpoints and S3 Access Points. I think thats a different use case, if not then we will update our docs.

@ghost
Copy link
Author

ghost commented Dec 28, 2021

Yes, I read this point and was aware of this. Since I didn't know how to explicitly launch a node in private subnet and attach NAT gateway to it, I was looking for a work around. I will try this approach now. I will close it once I get my cluster running, if that's ok with you.

@cgchinmay
Copy link
Contributor

@suryakiran1006 you can launch node in private subnet from the AWS console [EKS-Clusters-Configuration-Compute] You will see an option to add a nodegroup. While creating nodegroup, select private subnet from your vpc.
Once created. go to the the private subnet and check for Route table entries. You should see a NAT gateway entry already present. If not, you will have either create one in public subnet and reference it here or reference an existing one.
The NAT gateway itself is a part of public subnet but you just reference it here in the private subnet route table entries.

I am not sure if there is a workaround to this. Will let you know if I find any. Thanks

@ghost
Copy link
Author

ghost commented Jan 5, 2022

It's working now. Closing the issue. Thanks for your help

@ghost ghost closed this as completed Jan 5, 2022
@github-actions
Copy link

github-actions bot commented Jan 5, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@Sandy7894
Copy link

I am facing the similar issue now, i have deployed the pods in private subnet with a NAT gateway attached.. still unable to access s3 and getting the error like : Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/"

What would be the possible reason ?

@jdn5126
Copy link
Contributor

jdn5126 commented Sep 29, 2023

@Sandy7894 your best next step here is to open up an EKS support case through your AWS console, as it is hard to debug something like this without it

@Sandy7894
Copy link

Hey @jdn5126, thanks.. it is working now after creating endpoint for STS

@ahilmathew
Copy link

I'm still having this issue. The pod security group has open ingress and egress. The nodes are on public subnets. I was still getting timeout on sts. After adding a VPC endpoint it started working.
However, I want to connect to a dynamodb in another region and this request is erroring out too

send request failed\ncaused by: Post \"https://dynamodb.us-east-1.amazonaws.com/\": dial tcp 3.218.181.176:443: i/o timeout"}

I am not sure why it wouldn't work? Any ideas or leads would be greatly appreciated. Thanks

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants