Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.14.0 Scheduled backup fails if namespace doesn't exist #7928

Closed
phoenix-bjoern opened this issue Jun 25, 2024 · 18 comments · Fixed by #7965
Closed

v1.14.0 Scheduled backup fails if namespace doesn't exist #7928

phoenix-bjoern opened this issue Jun 25, 2024 · 18 comments · Fixed by #7965

Comments

@phoenix-bjoern
Copy link

phoenix-bjoern commented Jun 25, 2024

What steps did you take and what happened:
Create a schedule and include a non-existing namespace:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-application-backup
  namespace: velero
  annotations:
  labels:
    app.kubernetes.io/name: velero
    app.kubernetes.io/instance: velero
spec:
  useOwnerReferencesInBackup: false
  schedule: "0 0 * * *"
  template:
    csiSnapshotTimeout: 14400s
    hooks: {}
    includeClusterResources: false
    includedNamespaces:
    - my-namespace
    - a-non-existing-namespace
    itemOperationTimeout: 0s
    metadata: {}
    ttl: 720h

What did you expect to happen:

Velero before v1.14.0 executed the backup without complaint. Velero v1.14.0 stops the scheduled backup instantly and sets the status FailedValidation instead of executing the backup and e.g. setting the PartiallyFailed status. This new behavior is dangerous (or even critical?) as scheduled backups become fragile and stop working as soon as a defined namespace doesn't exist anymore.

@phoenix-bjoern
Copy link
Author

It seems the behavior was changed in #7431 by @ywk253100. While I agree that there should be some failure reporting if "--include-namespaces" references namespaces which do not exist, I don't agree that scheduled backups should be skipped entirely.

This behavior is dangerous as the backup could be skipped without being recognized. IMHO the backup should be executed and the status should become PartiallyFailed instead. This will backup existing items and report that some items (like the namespaces which have been declared to be backed up) were missing.

@phoenix-bjoern phoenix-bjoern changed the title v1.14.0 Scheduled backups fails if namespace doesn't exist v1.14.0 Scheduled backup fails if namespace doesn't exist Jun 25, 2024
@kaovilai
Copy link
Contributor

Ignoring this being in a schedule, has velero ever claimed that includedNamespaces can include namespaces that do not exists?

If we ever implement wildcard/regex for namespaces to include #1874 then perhaps you could define that regex which I would expect would not fail a backup if no namespace exists.

@blackpiglet
Copy link
Contributor

@kaovilai
Velero doesn't support the wildcard or regular expression in the namespace filter parameters.

@phoenix-bjoern
There are some issues similar to this topic, and some requests fail the backup on nonexisting namespaces, and some requests are identical to yours.

@kaovilai
Copy link
Contributor

@kaovilai
Velero doesn't support the wildcard or regular expression in the namespace filter parameters.

Right, referring to potential future implementation. 🤞

@blackpiglet blackpiglet added the Icebox We see the value, but it is not slated for the next couple releases. label Jul 1, 2024
@blackpiglet
Copy link
Contributor

I prefer not to change this logic.

The reason is marking backup as PartiallyFail on a non-existing namespace has been the existing behavior for quite some time.
Modifying it is a breaking change, which is not an obvious improvement.

This PR introduces the behavior of marking backup as PartiallyFail on a non-existing namespace.
Before this PR, the backup was marked as Fail immediately.

Until PR #6320 accidentally broke the check, then #7431 restored that behavior.

@phoenix-bjoern
Copy link
Author

phoenix-bjoern commented Jul 1, 2024

@blackpiglet Not sure if I can follow. PartiallyFail would be actually fine for me. But in v1.14.0 backups fail immediately with status FailedValidation, so no namespace is backed up. That means that users can easily break the configured backup procedure, which is not good. PartiallyFail gives operators time to fix either the backup config or re-add the missing namespace without compromising safety. Not taking a backup is the worst option.

@kaovilai
Copy link
Contributor

kaovilai commented Jul 1, 2024

I prefer not to change this logic.

Same same.

My suggestion is that if we ever support wildcard/regex, that it be a completely new field in the backup spec just for regex/wildcard support. Then the behavior to support this issue won't be breaking change.

You'd have existing field for exact name, and new field for regex/wildcard.

@phoenix-bjoern
Copy link
Author

phoenix-bjoern commented Jul 1, 2024

@kaovilai this issue is not about namespaces and regex/wildcards. It is about plain namespaces in schedules, which might be missing for whatever reason, and Velero v1.14.0 now doesn’t perform any backup. IMHO this is a critical issue and should be changed to a status „PartiallyFailed“ instead of „FailedValidation“ to ensure all other namespaces get backed up as intended by the schedule definition.

@kaovilai
Copy link
Contributor

kaovilai commented Jul 1, 2024

I get the point about this not being regex.. I am simply suggesting that

"if namespace don't exists, don't error" be part of the regex implementation in a new includedRegexNamespaces.. meaning if your regex is "namespaceExactName", and it doesn't match a current list of namespaces, it won't error.

@kaovilai
Copy link
Contributor

kaovilai commented Jul 1, 2024

current available fields behavior is unlikely to change per #7928 (comment)

@sseago
Copy link
Collaborator

sseago commented Jul 1, 2024

@kaovilai hmm. I think PartiallyFailed is the longstanding Velero behavior. 1.12 introduced a regression that resulted in Completed rather than PartiallyFailed, but unfortunately the 1.14 fix resulted in FailedValidation backups rather than PartiallyFailed. Based on what @blackpiglet said above ("The reason is marking backup as PartiallyFail on a non-existing namespace has been the existing behavior for quite some time.
Modifying it is a breaking change, which is not an obvious improvement.") it would appear that we should restore the PartialFailure change and not fail validation

@blackpiglet
Copy link
Contributor

blackpiglet commented Jul 2, 2024

@phoenix-bjoern
Ah, I see. Sorry for the misunderstood.

@kaovilai @sseago
My understanding is the agreement is we should mark the Backup with a non-existing namespace name in the namespace filters as PartiallyFailed instead of FailedValidation.

If that's correct, I will create a PR to fix it.

@phoenix-bjoern
Copy link
Author

@blackpiglet CORRECT! Looking forward to your PR and happy to test it ❤️

@sseago
Copy link
Collaborator

sseago commented Jul 2, 2024

@blackpiglet I think we're all in agreement now. Go ahead and create the PR.

@blackpiglet blackpiglet removed the Icebox We see the value, but it is not slated for the next couple releases. label Jul 5, 2024
@phoenix-bjoern
Copy link
Author

@blackpiglet Do you plan to release this change as a new Velero version or should this change be considered as a bugfix for the v1.14 release? If it is a bugfix for v1.14 could you please merge the changes also to the release-1.14 branch?

@blackpiglet
Copy link
Contributor

OK. I will create a cherry-pick PR to the release-1.14

@phoenix-bjoern
Copy link
Author

@blackpiglet Confirming the solution works, thank you!
Lookup forward to the v1.14.1 bugfix release :-)

@phoenix-bjoern
Copy link
Author

@blackpiglet Thank you for the fix. Closing the ticket as the issue is resolved. Looking forward for the v1.14.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants