Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE-2962: configure ovs should use node-ip-hint set by nodeip-configuration service #3233

Merged

Conversation

tsorya
Copy link
Contributor

@tsorya tsorya commented Jul 6, 2022

RFE-2962: configure ovs should use node-ip-hint set by
nodeip-configuration service to set bridge on interface that matches
this ip

- What I did

- How to verify it

- Description for the changelog

@tsorya
Copy link
Contributor Author

tsorya commented Jul 6, 2022

@cybertron this will use file that will be created in openshift/baremetal-runtimecfg#185
WDYT?

Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, modulo the changes I requested in openshift/baremetal-runtimecfg#185

One suggestion inline, but that could be done as a followup too.

templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
@tsorya tsorya force-pushed the igal/ovs_config_node_ip_file branch 2 times, most recently from 8979e09 to 9387835 Compare September 9, 2022 10:47
Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

We should probably retest this once the runtimecfg patch merges, but it's working for me locally.

templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2022
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 12, 2022
@tsorya
Copy link
Contributor Author

tsorya commented Sep 13, 2022

/retest

1 similar comment
@tsorya
Copy link
Contributor Author

tsorya commented Sep 15, 2022

/retest

@tsorya
Copy link
Contributor Author

tsorya commented Sep 18, 2022

@cybertron @rbbratta Can we push this one?

@tsorya
Copy link
Contributor Author

tsorya commented Sep 18, 2022

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2022

@tsorya: The following commands are available to trigger required jobs:

  • /test 4.12-upgrade-from-stable-4.11-images
  • /test cluster-bootimages
  • /test e2e-agnostic-upgrade
  • /test e2e-aws
  • /test e2e-gcp-op
  • /test images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
  • /test bootstrap-unit
  • /test e2e-aws-disruptive
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-upgrade
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-ovn-upgrade
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-single-node
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-upgrade
  • /test e2e-vsphere-upi
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-aws
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-openstack
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tsorya
Copy link
Contributor Author

tsorya commented Sep 18, 2022

/test e2e-aws-single-node

@tsorya
Copy link
Contributor Author

tsorya commented Sep 18, 2022

/test e2e-metal-ipi-ovn-ipv6

@cybertron
Copy link
Member

This looks reasonable to me. I'm running a deployment locally just so I can poke around and verify that it does the right thing, but given that ci is passing I expect it will be fine.

if [[ -z "${ip_hint}" ]]; then
return
fi
while true; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should try for a bounded amount of time instead of forever. An unexpected issue could get us stuck here.

templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
nodeip-configuration service to set bridge on interface that matches
this ip
@tsorya tsorya force-pushed the igal/ovs_config_node_ip_file branch 2 times, most recently from 7bae977 to a6aa1a0 Compare September 27, 2022 18:41
Comment on lines 506 to 507
# Retry for 5 minutes
retries=300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be 5 minutes? Isn't the tentative status on an IPv6 address expected to only last a few seconds? I would not expect more than 10 bind attempts throughout a time window of ~20s.

Otherwise, if there is some other issue we are going to be sitting there 5 mins masking an unrelated problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a minute here.
In anycase if this will fail we will be stuck on installer while waiting for masters to be up, so waiting for a minute will not actually change anything

echo "No ipv6 ip to bind was found"
break
fi
random_port=$(shuf -i 30000-65000 -n 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default ephemeral port range is

❯ cat /proc/sys/net/ipv4/ip_local_port_range                             
32768	60999

IANA defines the range for temporary services as
49152–65535

We probably should do fine with a much more smaller sub-range of ports between 49152 and 60999.

I am fine with randomizing a port, but honestly I did not expect any collision using the default netcat port 31337. Better be safe than sorry I guess.

@tsorya tsorya force-pushed the igal/ovs_config_node_ip_file branch 2 times, most recently from f16a43e to 236296f Compare September 28, 2022 16:11
Comment on lines 560 to 563
# Finds the default interface.
# Reading $ip_hint_file and searching for interface that matches this ip.
# In case $ip_hint_file was not set use flow below.
# If the default interface is br-ex, use that and return.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I propose this instead

# Accepts parameters $iface, $iface_default_hint_file, $ip_hint_file
# Finds the nodeip interface from the interface that matches the ip address in $ip_hint_file.
# Otherwise fallbacks to a previously used interface or to the default interface.ç
# Never use the interface that is provided inside extra_bridge_file for br-ex1.
# Never use br-ex1.
# Read $ip_hint_file and return the interface that matches this ip.  Otherwise:
# If the default interface is br-ex, use that and return.
# If the default interface is not br-ex:
# Check if there is a valid hint inside iface_default_hint_file. If so, use that hint.
# If there is no valid hint, use the default interface that we found during the step
# earlier. Write the default interface to the hint file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this I realized that this might lead to a change in the interface from what was previously used, which is what iface_default_hint_file is trying to prevent.

This was done because of bug https://bugzilla.redhat.com/show_bug.cgi?id=2057160 but I fail to see any actual problem documented in the bug.

@rbbratta @andreaskaris are you aware of what actual problems did changing the interface used actually bring up, if any?

If we are not sure about this, would it be a problem to honor iface_default_hint_file above ip_hint_file? Or are we actively seeking support to move to another interface after an upgrade (or after a reboot)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support moving IMO. We need to be aligned with nodeip service this is actual idea of it. Though it should always be the same interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be coupled with nodeip service another way if it will set ip that is not matching interface it will cause very strange failures, those that we are trying to fix with this PR actually. Ovs should not decide from now, it should take whatever nodeip decide to be right by setting node-ip file and ovs just takes interface that matches it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that but I am not sure right now if this caused any problems in ovn-kubernetes, I guess not.
Had you have the chance to actually try out what happens when we actually move from another interface?
I don't think we have any actual verification of this.

Copy link
Contributor Author

@tsorya tsorya Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface was changed as expected but i didn't actually look on what is going on with cluster itself to say the truth. Though chance if getting another interface for the same IP is really low

Copy link
Contributor

@rbbratta rbbratta Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcaamano in 2057160 the different interfaces were on different subnets, so I assume they didn't have connectivity on the other subnets.

Feb 22 17:54:43 master-0 configure-ovs.sh[2310]:     inet 192.168.111.20/24 brd 192.168.111.255 scope global dynamic noprefixroute br-ex
Feb 22 17:54:43 master-0 configure-ovs.sh[2310]: 192.168.111.0/24 dev br-ex proto kernel scope link src 192.168.111.20 metric 49
Feb 22 18:33:11 master-0 configure-ovs.sh[2775]:     inet 192.168.111.20/24 brd 192.168.111.255 scope global dynamic noprefixroute br-ex
Feb 22 18:33:11 master-0 configure-ovs.sh[2775]: 192.168.111.0/24 dev br-ex proto kernel scope link src 192.168.111.20 metric 49
Feb 22 18:33:15 master-0 configure-ovs.sh[2775]:     inet 192.168.221.20/24 brd 192.168.221.255 scope global dynamic noprefixroute br-ex
Feb 22 18:33:15 master-0 configure-ovs.sh[2775]: 192.168.221.0/24 dev br-ex proto kernel scope link src 192.168.221.20 metric 49

The issue would also be that moving interfaces changes IP. Changing IP breaks Etcd Certs on masters.

EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.30.1.4, not 2101::4, x509: certificate is valid for ::1, 10.30.1.4, 127.0.0.1, ::1, not 2101::4] here the certificate is valid for IPV4

I don't think we ran into a case where interface changed but IP didn't.

2057160 is tricky in that it is IPI OSP with additionalNetworkIDs. OSP additionalNetworkIDs is the only case of multi-homed IPI that I know of.

We might be able to test with a bonding scenario, if both slaves have the same static IP DHCP mapping and bond0 fails to activate somehow on boot, NM should fall back to just using one of the slaves, so IP might stay the same but interface change on reboot from bond0 to ens192 or some such.

templates/common/_base/files/configure-ovs-network.yaml Outdated Show resolved Hide resolved
tsorya and others added 2 commits September 28, 2022 19:58
In order to ensure all services on a node are using the same IP, we
need to have nodeip-configuration run first and allow other services
to use the results to help them figure out which IP and interface
they should use.

Previously, ovs-configuration was a prereq for network-online and
nodeip-configuration waited for network-online. Additionally,
ovs-configuration waited for NetworkManager-wait-online to ensure
NetworkManager was fully up before it tried to do its configuration.
To change the order of these two services, I removed the
network-online dependency from nodeip-configuration and replaced it
with the NetworkManager-wait-online that was previously part of the
ovs-configuration deps. A dependency was also added between
ovs-configuration and nodeip-configuration to ensure that they run
in the correct order.

This way, at the end of ovs-configuration we are able to reach the
network-online target just as before, but now we've also run
nodeip-configuration at that point so we know all of the services
should have the correct IP. Essentially nodeip-configuration was
moved earlier in the boot process, but everything else has been
left alone.
@jcaamano
Copy link
Contributor

/test e2e-aws-ovn-workers-rhel8
/test e2e-azure-ovn-upgrade
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-ovn-step-registry
/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade

@jcaamano
Copy link
Contributor

@cybertron one last PTAL?

@cybertron
Copy link
Member

This one looks good in my local dual stack test.

IIRC, the bug related to the previous hint file had to do with the fact that the route priority was being used to select the interface, and when the bridge got destroyed it changed that priority in undesirable ways. Having the ip selection happen before the bridge gets torn down should fix that too. The only place it should matter anymore is for platforms that don't use nodeip-configuration, and I think those by definition don't have the multiple interface problem (except vsphere upi, which is a gap that needs to be addressed separately).

I opened the previous bug because it seemed wrong that the bridge moved to a completely different subnet after reboot, but TBH I don't think anything actually broke as a result. That said, I didn't test the cluster extensively in that state either.

In any case, I think this change also fixes that previous bug so it should be fine if we prioritize this new hint file over the old one.

@tsorya
Copy link
Contributor Author

tsorya commented Sep 28, 2022

/retest

@jcaamano
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2022
@jcaamano
Copy link
Contributor

/hold cancel
on behalf of @yuqi-zhang

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2022
Copy link
Contributor

@flaper87 flaper87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, flaper87, jcaamano, tsorya, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2022

@tsorya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node 4bd0b8259b64afe64284ac24c59432032ab2059b link false /test e2e-aws-single-node
ci/prow/e2e-azure-ovn-upgrade eb969e7 link false /test e2e-azure-ovn-upgrade
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade eb969e7 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-workers-rhel8 eb969e7 link false /test e2e-aws-ovn-workers-rhel8
ci/prow/e2e-openstack eb969e7 link false /test e2e-openstack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 6562be8 into openshift:master Sep 29, 2022
cybertron added a commit to cybertron/machine-config-operator that referenced this pull request Oct 3, 2022
In openshift#3233 we changed the order of dependencies for ovs-configuration
and nodeip-configuration. However, because nodeip-configuration doesn't
run everywhere, this means ovs-configuration lost its depenedency on
NetworkManager-wait-online in some cases. Since the first patch
merged, some upgrade jobs have been having issues that seem to be
related to that change. It's not entirely clear why, but since this
was an unintended change to the dependencies it seems like a strong
candidate.

To fix the problem, we can just have the NetworkManager-wait-online
dep in both services. When nodeip-configuration runs it will be
redundant, but at least ovs-configuration will always wait for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants