-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'timeout expired waiting for volumes to attach/mount for pod when cluster' when node-vm-size is Standard_B1s #166
Comments
@adam7 could you collect kubelet logs on that agent VM? Follow by: |
also, collect pod & pvc info:
|
@andyzhangx I've just deleted the cluster so I can try with B1MS nodes. If I see the same behaviour I'll collect the logs, if not I'll fire up another B1S cluster later and grab the logs from that |
@andyzhangx Trying to get the kubelet logs using this command
fails with
|
kubectl describe po Gives me: Name: kasper-the-friendly-mariadb-7958b49759-dfpdn Warning Unhealthy 10m (x844 over 7h) kubelet, aks-nodepool1-22287053-0 Readiness probe failed: mysqladmin: connect to server at 'localhost' failed |
Hi @adam7 I have tried with
And in my env, finally disk mount succeeded after timeout:
|
It costs 78s to format a disk in my Standard_B1s env, while in a Standard_D1_v2 disk, it costs only 9s, that's the reason why it fails on Standard_B1s
|
Thanks for checking @andyzhangx that confirms what I suspected. What is the minimum VM size to use if I want to use a persistent volume with my cluster? |
I don't know, I usually use |
@andyzhangx any suggestion how I might find out other than testing every vm size? Thanks |
I think |
@slack I don't think this should be closed unless the minimum requirement for an AKS node is Standard_DS2_v2 |
I’m running into this same issue in Canada central. Doesn’t matter the vm size, always a mount timeout. Any recommendations? |
@devkws pls provide following info:
For the small VM size issue, pls run |
Edit: I completely missed this is the AKS repo and not the ACS...please feel free to disregard this message. We are also constantly running into this, sometimes need to redeploy the same pod multiple times hoping it will work. Sometimes it does after 15 minutes, sometimes never.
k8s version:
kubelet log
Maybe these lines are relevant?
|
@mrdfuse just loop me in if you have such azure disk issue. Time cost for Azure Disk PVC mountTime cost for Azure Disk PVC mount on a standard node size(e.g. Standard_D2_V2) is around 1 minute, and there is a PR using cache fix to fix this issue, which could reduce the mount time cost to around 30s. I pasted the useful logs in the below, volume mounting costing more than 2 min would be timeout, mounting two azure disks could cost a little than 2 min, there could be timeout error, while it retried and it succeeded finally. And we have fixed this issue by improve the mount time from 1min to 30s, details could be found here
And in your comments, there are azure disk mounting cost more than 15 minutes? what's the kubelet version? I also fixed such issue which only exists in v1.9.0 - v1.9.6, details are here: |
While the logs may show the disks mounted successfully in the end, kubernetes never saw that and kept complaining about timeouts. I retried with a new pod, and then it succeeded after trying for a few hours (!). I created an issue in the acs project for this as well, since I was mistaken here. |
@mrdfuse just run |
NAME STATUS ROLES AGE VERSION |
@mrdfuse So you tried with a new pod mount same azuredisk PVC? Is there any azuredisk moving from one node to another? azuredisk PVC could only be attached to one agent node. |
New pod mount, new pvc (actually a new helm chart release). No moving disks. |
Chiming in here to say the same is true with If the It's a bit frustrating to see this, be excited to save a bit of $$$ on proof-of-concept cluster, only to have to trash the cluster and bring up with another with a significantly larger cost (2x). |
As I could see in azure portal, when creating an AKS cluster, the default VM size is Moreover, I think we should remove all VM size with 1 CPU core in AKS support list. |
Issue
I'm trying to deploy mariadb to aks using helm but consistently hitting a 'Unable to mount volumes for pod ... timeout expired waiting for volumes to attach/mount for pod' error which stops the deployment dead.
This works without issue if the node-vm-size is set as the default, Standard_D1_v2, when I create the cluster.
The text was updated successfully, but these errors were encountered: