Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-27741: Reject Invalid Route Paths with Rewrite-Target Annotation #1791

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gcs278
Copy link

@gcs278 gcs278 commented Sep 5, 2024

Reject routes with the rewrite-target annotation set and a spec.path that would cause HAProxy startup failure. This solution maintains compatability for the rewrite-target annotation while only rejecting routes with problematic configuration. Additionally, add warnings to indicate to users that their route configuration with rewrite-target may produce unexpected results.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2024
Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 5, 2024
@openshift-ci-robot
Copy link

@gcs278: This pull request references Jira Issue OCPBUGS-27741, which is invalid:

  • expected the bug to target either version "4.18." or "openshift-4.18.", but it targets "4.17.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Reject routes with the rewrite-target annotation set and a spec.path that would cause HAProxy startup failure. This solution maintains compatability for the rewrite-target annotation while only rejecting routes with problematic configuration. Additionally, add warnings to indicate to users that their route configuration with rewrite-target may produce unexpected results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gcs278
Once this PR has been reviewed and has the lgtm label, please assign candita for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gcs278 gcs278 force-pushed the OCPBUGS-27741-reject-invalid-path-with-rewrite-target branch from f0754de to 707a5f2 Compare September 6, 2024 21:53
@@ -212,6 +214,42 @@ func validateRoute(ctx context.Context, route *routev1.Route, checkHostname bool
result = append(result, errs...)
}

if errs := validatePathWithRewriteTargetAnnotation(route, specPath.Child("path")); len(errs) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you only have an instant view of the object here, and don't have the prior view of it, is that correct?

Typically in webhooks you can compare state with the old state. If you can do that, you can make this validation ratchet by allowing the persistence of existing bad configuration, and only validate when there is actually a change to the path/annotation.

That would be safer here, as, presently, in theory, someone could upgrade and this would then prevent any writes to the object until the validation issue has been resolved.

If the validation here fails, I understand that HAProxy will be in a crash loop right? So in theory, the cluster would not be upgradeable anyway, and so the argument is that this doesn't need to ratchet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Let me clarify this: what happens with this current non-ratcheting validation when someone updates with a newly invalid route?

I think the answer is: nothing...until the user changes any field on the invalid route. It's not going to cause a K8S cluster melt-down right? Will ALL updates, including status updates made by the openshift-router be rejected too? (that might be okay, but something for me think about...)

On the other hand, ratcheting validation allows the user to still update other fields route while our spec.path still be invalid (but we won't reject yet), correct?

I think if i understand how non-ratcheting validation compares to ratcheting (e.g. non-ratcheting rejects ALL updates to a route), yes, I think we should be okay since the validation I'm adding results in a HAProxy crash loop, which means the user is already broken.

I'd like to confirm if this is a valid use case for doing the sledgehammer of non-ratcheting validation, since, yes, HAProxy will be crashlooping in any of these situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is: nothing...until the user changes any field on the invalid route. It's not going to cause a K8S cluster melt-down right? Will ALL updates, including status updates made by the openshift-router be rejected too? (that might be okay, but something for me think about...)

Test this, but, I believe it will prevent ALL writes, including status.

On the other hand, ratcheting validation allows the user to still update other fields route while our spec.path still be invalid (but we won't reject yet), correct?

Correct, they would only be presented with errors when they change something specific to this validation, the spec path or the annotation in this particular case.

I think if i understand how non-ratcheting validation compares to ratcheting (e.g. non-ratcheting rejects ALL updates to a route), yes, I think we should be okay since the validation I'm adding results in a HAProxy crash loop, which means the user is already broken.

How would this manifest in the cluster? Would it cause an upgradeable condition to go false on the cluster operator? If it does, then the user must fix the issue before they upgrade into a version containing this validation, right?

Copy link
Author

@gcs278 gcs278 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test this, but, I believe it will prevent ALL writes, including status.

Hmm looks like if I try to edit the status via oc edit route... it fails, but the router status update appears to be successful despite the non-ratcheting validation.

How would this manifest in the cluster? Would it cause an upgradeable condition to go false on the cluster operator? If it does, then the user must fix the issue before they upgrade into a version containing this validation, right?

No upgradeable condition currently. The problem with doing upgradeable, is that we want to backport. In theory, I think we could do upgradeable=false on z streams, but previously the upgrade team didn't like the idea of a single route preventing an entire cluster from upgrading (route admin vs. cluster admin personas), so we used Admin-Acks so the cluster admin could override, which only works for minor level upgrades. Upgradeable=false is also going to be more painful to implement.

The real argument against upgradeable=false, is that users that are already in this state (have invalid config), are already broken. Upgradeable doesn't feel useful when routers are already broken.

So, users have a broken HAProxy if they enter this invalid state we are adding validation for. Ratcheting validation would allow them to modify other config fields, but what's the point if they are totally broken? Arguably they should fix the invalid state ASAP. However, the only argument for ratcheting validation is: it is somewhat of a safety mechanism in case we miss an edge case where a newly invalid state WAS actually valid (didn't cause failure). However, it's still ratcheting validation, so eventually if they recreate the route, or try to change spec.path or the rewrite-target annotation, the validation will catch up with them. And this scenario would be a bug we should fix anyways.

So, I don't see any strong reasons to do ratcheting validation. @JoelSpeed Do you agree, or have any thoughts otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looks like if I try to edit the status via oc edit route... it fails, but the router status update appears to be successful despite the non-ratcheting validation.

Did you use the --subresource status flag when trying oc edit? You'll need to if the object has a status subresource

The real argument against upgradeable=false, is that users that are already in this state (have invalid config), are already broken. Upgradeable doesn't feel useful when routers are already broken.

I wasn't suggesting to implement an upgradeable false, sorry, I meant, if the routers are already broken, how does the cluster operator look? Does an invalid config today (causing restarting HAProxy) already cause an upgradeable false or some other note in the cluster that would prevent the user upgrading into this while already broken?

Reject routes with the rewrite-target annotation set and a spec.path
that would cause HAProxy startup failure. This solution maintains
compatability for the rewrite-target annotation while only rejecting
routes with problematic configuration. Additionally, add warnings to
indicate to users that their route configuration with rewrite-target
may produce unexpected results.
@gcs278 gcs278 force-pushed the OCPBUGS-27741-reject-invalid-path-with-rewrite-target branch from 707a5f2 to 5de0e83 Compare September 17, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants