-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress IP fixes #1567
Egress IP fixes #1567
Conversation
/retest |
1 similar comment
/retest |
85abf63
to
e0c3ac3
Compare
e0c3ac3
to
9933a1e
Compare
@danwinship : I've addressed your comments. PTAL |
35d490d
to
ebb10d1
Compare
I have update the PR with the correct egress IP implementation for shared gateway mode. PR: #1616 will now bring in the necessary fix in OVN. So we should be good to go. |
/hold I actually need another fix which I am currently implementing. Then this should be ready. |
ebb10d1
to
9a063c3
Compare
/hold cancel Added two commits fixing corner cases in the egress IP logic. |
9a063c3
to
10ad87e
Compare
@danwinship , addressed all comments. PTAL, and let me know if you'd prefer I squash some commits or not. I've kept them separated for easier review / tracking of what has been said here. |
changes look good other than the race-detector problem which I commented on above. Please squash the fixups back into their original commits in the next round |
cbabf14
to
ce80147
Compare
Updated and squashed. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just a few more comments. poke me when you've updated it.
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Keep `getPodIPs` true to its functional name `getPodIPs` was doing additional operations not related to "getting the pod IPs". Move that to `addPodEgressIP` Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Rename nodeLogicalRouterIP with IP version for egress IP tests Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Fix variable spelling mistake Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
The initial implementation of egress IP in shared gateway mode was dependent on having the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1861294 resolved. This is now the case and the fix is in the latest Fedora 31 release. This commit thus switches to the correct shared gateway mode implementation. Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Tag egress IP related OVN transactions with external_id This patch deals with avoiding inconsistent state when a pod matches multiple egress IPs. Before this patch: the lr-policy-add / lr-nat-add commands failed once the second egress IP setup happened. This wasn't a problem in itself, but if that egress IP was later deleted: it then proceeded to delete the egress setup for the first egress IP. It goes like this: 1) EgressIP-1 is created matching pod X, setup happens correctly 2) EgressIP-2 is created matching pod X, setup fails because that exact same setup (nat and policy rules) already exists 3) EgressIP-2 is deleted, deleting EgressIP-1's setup. 4) OVN state is now screwed up and pod egress functionality will no longer work even though EgressIP-1 still exists. To avoid this we have the possibility to tag the policy and nat objects with the external_id, which in this case is the EgressIP.Name. As such we will only try deleting whatever setup we find for that particular object. The OVN API for doing this is not as straightforward though, and we use an extra OVN transaction for the delete, but we win in correctness. Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Expanding on the docstring for policyAlreadyExistsMsg Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Simplify logical router policy filter for egress IP Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Check if nat/lr-policy exists before creating it As to avoid duplicate objects in the OVN database, check that the NAT / logical router policy object which is about to be created does not already exist Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
The policy command in local gateway mode (specifying pkt_mark) was incorrect. It is not a column in the logical_router_policy table, it's an option. Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
ce80147
to
8121453
Compare
@danwinship , just pushed again with a version addressing your latest comments. |
Hmm, last build failure was caused by:
Seems like we have flakes (issues?) with the hybrid overlay code. |
The egress IP code was modifying the API server object directly. Leading to data races when running with the race detector set to "on". Instead copy the object, modify it and update the API server object with the copied one. Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Remove unecessary locking in egress tests. Egress tests were locking eIPAllocatorMutex as to avoid the data races, which have now been fixed by the top level commit. Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Encapsulate all assignEgressIPs operations in lock Signed-off-by: Alexander Constantinescu <aconstan@redhat.com> Rename utility functions in egress*_test.go to reflect its new behaviour
8121453
to
585654f
Compare
OCPBUGS-8741: [release-4.13] Handle Completed pods deletion
- What this PR does and why is it needed
This PR fixes a couple of issues with the egress IP feature that merged yesterday with PR: #1484. There is no real rush to get this in today and it can wait until next week. I didn't have time to do this yesterday with all the things going on.
Problems solved:
When deleting the pod's egress setup (upon a pod delete) we were retrieving the IP from the OVN database and subsequently racing with
deleteLogicalPort
, in case we lost: we never cleaned up. I thus changed that to only look at thepod.Status
instead, this resolves the race condition. Since a pod IP can't change we thus never really need to look in the OVN DB for that. This also simplified the code as I didn't need to pull inovnNBClient
into the data containers intended at managing egress IP assignments.I never properly removed any watch handlers from the namespace / pod handlers used for egress IP assignment. This lead to very strange corner cases which have now been fixed. I've also added tests taking into account those cases.