Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Kubespray failure due to SSH error #247

Open
LukeRepko opened this issue Apr 24, 2024 · 0 comments
Open

Intermittent Kubespray failure due to SSH error #247

LukeRepko opened this issue Apr 24, 2024 · 0 comments

Comments

@LukeRepko
Copy link
Contributor

Describe the bug
During kubespray, the following task seems to fail intermittently. Now, this could be considered an upstream bug because kubespray is not modifying MaxSessions on the first kubernetes control plane node, but.. should it?

Seen while running cluster.yml to add a few additional compute nodes:

TASK [kubernetes/kubeadm : Create kubeadm token for joining nodes with 24h expiration (default)] **********************************************************************************************************************************************
fatal: [compute016 -> kubernetes01]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"kubernetes01\". Make sure this host can be reached over ssh: mux_client_request_session: session request failed: Session open refused by peer\r\nkex_exchange_identification: Connection closed by remote host\r\nConnection closed by 172.24.9.61 port 22\r\n", "unreachable": true}

Observed on the kubernetes01 node in question:

Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions
Apr 24 22:55:58 kubernetes01 sshd[1208808]: error: no more sessions

Issue was resolved by increasing MaxSessions and MaxStartups in /etc/ssh/sshd_config on the kubernetes01 node.

Related stack-exchange ref: https://unix.stackexchange.com/a/22987

To Reproduce
Steps to reproduce the behavior:
Kubespray a large group of nodes, some amount over 10.

Expected behavior
No failure to connect via ssh from cluster member nodes to the kubernetes control plane node.

Screenshots
If applicable, add screenshots to help explain your problem.

Server (please complete the following information):

  • OS: Ubuntu
  • Version 22.04
  • openssh-server version: 1:8.9p1-3ubuntu0.6

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant