machine-config-server should not listen in the local port range #166

squeed · 2018-11-12T13:30:16Z

The machine-config-operator seems to listen on port 49500 (with hostNetwork: true). This is in the default ip_local_port_range, which means it can collide with active tcp sessions:

[root@test1-master-0 core]# sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    60999

It should serve on a port lower than 32768.

For example, I managed to collide with a persistent connection from the apiserver to etcd:

[root@test1-master-0 core]# nc -l -t -p 49500
Ncat: bind to 0.0.0.0:49500: Address already in use. QUITTING.
[root@test1-master-0 core]# ss -np | grep 49500
tcp    ESTAB      0      0      192.168.126.11:49500              192.168.126.11:2379                users:(("hypershift",pid=10044,fd=60))

The text was updated successfully, but these errors were encountered:

abhinavdahiya · 2018-11-12T19:40:45Z

/cc @crawford

crawford · 2018-11-12T20:00:27Z

@squeed Do you have a specific range that we should use? Does OpenShift define a particular range that we can use for internal services? If not, should we define one?

cgwalters · 2018-12-07T20:43:27Z

To clarify, this port is required to serve Ignition configs, and Ignition runs in the initramfs before a node has joined the cluster and can use cluster networking, etc.

That said, is there any reason we couldn't just pick a free port dynamically on startup?

crawford · 2018-12-08T22:52:46Z

That said, is there any reason we couldn't just pick a free port dynamically on startup?

All of the machines in the cluster would have to know what port number they should connect to. If it were dynamically when the MCS started, how would new machines know where to connect?

ashcrow · 2018-12-11T14:26:45Z

Service discovery through etcd might be an option, but it would be more complicated than a static, agreed upon port.

squeed · 2018-12-11T15:09:58Z

You just need to change the port. It cannot be in the local port range. Just pick a new number < 32768

ashcrow · 2018-12-11T16:26:59Z

32623 doesn't seem to be in use officially or unofficially AFAICT.

cgwalters · 2018-12-20T16:16:44Z

Was glancing at this just for my own edification, it seems like when we change this we need to make a co-ordinated change to the installer:

https://github.com/openshift/installer/blob/ac006ae671a645553d58c8a29c676968dfa3d85f/pkg/asset/ignition/machine/node.go#L24

wking · 2019-01-22T20:53:04Z

For folks blindly searching issues, the current behavior results in logs like:

F0122 18:58:33.952823       1 api.go:59] Machine Config Server exited with error: listen tcp :49500: bind: address already in use```

leading to e2e errors like

fail [github.com/openshift/origin/test/extended/operators/cluster.go:109]: Expected
    <[]string | len:2, cap:2>: [
        "Pod openshift-machine-config-operator/machine-config-server-7mhkb is not healthy: container machine-config-server has restarted more than 5 times",
        "Pod openshift-machine-config-operator/machine-config-server-ntrdk is not healthy: container machine-config-server has restarted more than 5 times",
    ]
to be empty

...

failed: (2m3s) 2019-01-22T19:11:29 "[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]"

Out of band, @crawford said:

That error is usually the result of the process dying and the kernel not releasing those resources fast enough. You can get around that with SO_REUSEPORT

squeed · 2019-01-23T10:05:10Z

That can indeed happen, but that's not what happened here. When I filed this bug, there was a clear port conflict with an outgoing connection from the apiserver process to etcd. No amount of waiting would fix the issue

The port needs to be moved, or this random failure will continue to happen.

kikisdeliveryservice · 2019-02-01T01:00:37Z

This issue seems to have come up again seeing in MCS logs in payload promo gate:

I0131 23:33:57.210794       1 start.go:37] Version: 3.11.0-530-g71ace53d-dirty
I0131 23:33:57.211871       1 api.go:51] launching server
I0131 23:33:57.212117       1 api.go:51] launching server
F0131 23:33:57.212096       1 api.go:59] Machine Config Server exited with error: listen tcp :49500: bind: address already in use

https://storage.cloud.google.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/3736/artifacts/release-e2e-aws/pods/openshift-machine-config-operator_machine-config-server-khq9r_machine-config-server_previous.log.gz?_ga=2.58549930.-1062251045.1532122709

From the other logs:

Jan 31 23:25:24.675: INFO: Some pods in error: openshift-machine-config-operator/machine-config-server-khq9r
Jan 31 23:25:29.688: INFO: Some pods in error: openshift-machine-config-operator/machine-config-server-khq9r
Jan 31 23:25:29.942: INFO: Some pods in error: openshift-machine-config-operator/machine-config-server-khq9r

https://gubernator.k8s.io/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/3736

kikisdeliveryservice · 2019-02-01T01:01:23Z

Happy to make the changes here and in installer, if someone can let me know what was settled on for the port?
cc: @cgwalters @ashcrow

ashcrow · 2019-02-01T14:07:10Z

There wasn't disagreement on 32623. Unless someone had a reason to avoid the port it's a fair change.

abhinavdahiya · 2019-02-01T14:26:34Z

The default node port range is 30000-32767 for kubernetes nodeport services
Ref: https://kubernetes.io/docs/concepts/services-networking/service/#nodeport

Not sure if that will cause any problems? @squeed

jlebon · 2019-02-01T16:59:11Z

Hmm yeah, staying outside the default range makes sense to me given that client apps could hardcode a nodePort that matches whatever we choose there. (And it doesn't seem like the installer has a knob to change the range easily, so that's good.)

ashcrow · 2019-02-01T19:32:05Z

22623?

kikisdeliveryservice · 2019-02-01T19:34:16Z

any objections to 22623?

/assign

ashcrow · 2019-02-01T21:11:55Z

Seems like none 😸

crawford · 2019-02-01T21:36:26Z

22623 is fine.

Transition machine-config-server ports from 49500/49501 -> 22623/22624 to avoid conflict with local port and node port ranges. Listeners added for legacy ports until installer transitions to using the new ports. Closes: openshift#166

cgwalters mentioned this issue Dec 20, 2018

Multiple machine-config server restarts after 'http: TLS handshake error from 10.0.29.128:17205: EOF' #233

Closed

openshift-ci-robot assigned kikisdeliveryservice Feb 1, 2019

kikisdeliveryservice mentioned this issue Feb 1, 2019

MCS: change machine-config-server ports #368

Merged

kikisdeliveryservice mentioned this issue Feb 5, 2019

change machine-config-server port openshift/installer#1180

Merged

openshift-merge-robot closed this as completed in #368 Feb 7, 2019

This was referenced Feb 13, 2019

CI Failures: machine-config-server is not ready #414

Closed

MCS: remove legacy servers & ports #423

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-config-server should not listen in the local port range #166

machine-config-server should not listen in the local port range #166

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

crawford commented Nov 12, 2018

cgwalters commented Dec 7, 2018

crawford commented Dec 8, 2018

ashcrow commented Dec 11, 2018

squeed commented Dec 11, 2018

ashcrow commented Dec 11, 2018

cgwalters commented Dec 20, 2018

wking commented Jan 22, 2019

squeed commented Jan 23, 2019

kikisdeliveryservice commented Feb 1, 2019 •

edited

Loading

kikisdeliveryservice commented Feb 1, 2019 •

edited

Loading

ashcrow commented Feb 1, 2019

abhinavdahiya commented Feb 1, 2019

jlebon commented Feb 1, 2019

ashcrow commented Feb 1, 2019

kikisdeliveryservice commented Feb 1, 2019

ashcrow commented Feb 1, 2019

crawford commented Feb 1, 2019

machine-config-server should not listen in the local port range #166

machine-config-server should not listen in the local port range #166

Comments

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

crawford commented Nov 12, 2018

cgwalters commented Dec 7, 2018

crawford commented Dec 8, 2018

ashcrow commented Dec 11, 2018

squeed commented Dec 11, 2018

ashcrow commented Dec 11, 2018

cgwalters commented Dec 20, 2018

wking commented Jan 22, 2019

squeed commented Jan 23, 2019

kikisdeliveryservice commented Feb 1, 2019 • edited Loading

kikisdeliveryservice commented Feb 1, 2019 • edited Loading

ashcrow commented Feb 1, 2019

abhinavdahiya commented Feb 1, 2019

jlebon commented Feb 1, 2019

ashcrow commented Feb 1, 2019

kikisdeliveryservice commented Feb 1, 2019

ashcrow commented Feb 1, 2019

crawford commented Feb 1, 2019

kikisdeliveryservice commented Feb 1, 2019 •

edited

Loading

kikisdeliveryservice commented Feb 1, 2019 •

edited

Loading