Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

want a way for background tasks to await saga completion #6569

Closed
hawkw opened this issue Sep 13, 2024 · 1 comment · Fixed by #6574
Closed

want a way for background tasks to await saga completion #6569

hawkw opened this issue Sep 13, 2024 · 1 comment · Fixed by #6574
Assignees

Comments

@hawkw
Copy link
Member

hawkw commented Sep 13, 2024

Nexus API handlers (and other sagas) that start sagas have the ability to wait for the completion of the saga by awaiting the RunningSaga future returned by RunnableSaga::start. Background tasks, on the other hand, cannot currently do this, as their only interface to the saga executor is an Arc<dyn StartSaga>, which provides only the following saga_start method:

impl StartSaga for SagaExecutor {
fn saga_start(&self, dag: SagaDag) -> BoxFuture<'_, Result<(), Error>> {
async move {
let runnable_saga = self.saga_prepare(dag).await?;
// start() returns a future that can be used to wait for the saga to
// complete. We don't need that here. (Cancelling this has no
// effect on the running saga.)
let _ = runnable_saga.start().await?;
Ok(())
}
.boxed()
}
}

Note that the RunningSaga future is thrown away, meaning that the saga has been started, but there's no way to wait for its completion.

It would be useful for background tasks to be able to wait for the saga's completion. Not being able to do means that background tasks don't really have the ability to enforce meaningful concurrency limits on the sagas they start --- they can only limit the concurrency of starting sagas, not the number of sagas that are concurrently running. It also means that the approach for avoiding Nexus replicas all operating on instances/VMMs/disks in the same order described by @gjcolombo in #6565 (comment) is much harder to implement in cases where the operation that changes the state of a database record so that it is no longer returned by a query requires a saga completing, since the background task cannot easily wait for the sagas it starts to finish before querying the database again.

See also 0582e84

@hawkw hawkw self-assigned this Sep 13, 2024
@hawkw
Copy link
Member Author

hawkw commented Sep 13, 2024

I'd like to extend the StartSaga interface to permit it to return a saga completion future. Ideally, this would just be the RunningSaga type, but unfortunately, a number of tests currently rely on a NoopStartSaga type that doesn't actually start real sagas, so we'd need to abstract over that somehow...

hawkw added a commit that referenced this issue Sep 13, 2024
Nexus API handlers (and other sagas) that start sagas have the ability
to wait for the completion of the saga by `await`ing the `RunningSaga`
future returned by `RunnableSaga::start`. Background tasks, on the other
hand, cannot currently do this, as their only interface to the saga
executor is an `Arc<dyn StartSaga>`, which provides only the
[`saga_start` method][1]. This method throws away the `RunningSaga`
returned by `RunnableSaga::start`, so the completion of the saga cannot
be awaited. In some cases, it's desirable for background tasks to be
able to await a saga running to completion. I described some motivations
for this in #6569.

This commit adds a new `StartSaga::saga_run` method to the `StartSaga`
trait, which starts a saga *and* waits for it to finish. Since many
tests use a `NoopStartSaga` type which doesn't actually start sagas,
this interface still throws away most of the saga *outputs* provided by
`StoppedSaga`, to make it easier for the noop test implementation to
implement this method. If that's an issue in the future, we can revisit
the interface, and maybe make `NoopStartSaga` return fake saga results
or something.

I've updated the `instance_watcher` and `instance_updater` background
tasks to use the `saga_run` method, because they were written with the
intent to spawn tasks that run sagas to completion --- this is necessary
for how the number of concurrent update sagas is *supposed* to be
limited by `instance_watcher`. I left the region-replacement tasks using
`StartSaga::saga_start`, because --- as far as I can tell --- the "fire
and forget" behavior is desirable there. Perhaps @jmpesp can confirm
this?

Closes #6569

[1] https://github.com/oxidecomputer/omicron/blob/8be99b0c0dd18495d4a98187145961eafdb17d8f/nexus/src/app/saga.rs#L96-L108
hawkw added a commit that referenced this issue Sep 13, 2024
Nexus API handlers (and other sagas) that start sagas have the ability
to wait for the completion of the saga by `await`ing the `RunningSaga`
future returned by `RunnableSaga::start`. Background tasks, on the other
hand, cannot currently do this, as their only interface to the saga
executor is an `Arc<dyn StartSaga>`, which provides only the
[`saga_start` method][1]. This method throws away the `RunningSaga`
returned by `RunnableSaga::start`, so the completion of the saga cannot
be awaited. In some cases, it's desirable for background tasks to be
able to await a saga running to completion. I described some motivations
for this in #6569.

This commit adds a new `StartSaga::saga_run` method to the `StartSaga`
trait, which starts a saga *and* waits for it to finish. Since many
tests use a `NoopStartSaga` type which doesn't actually start sagas,
this interface still throws away most of the saga *outputs* provided by
`StoppedSaga`, to make it easier for the noop test implementation to
implement this method. If that's an issue in the future, we can revisit
the interface, and maybe make `NoopStartSaga` return fake saga results
or something.

I've updated the `instance_watcher` and `instance_updater` background
tasks to use the `saga_run` method, because they were written with the
intent to spawn tasks that run sagas to completion --- this is necessary
for how the number of concurrent update sagas is *supposed* to be
limited by `instance_watcher`. I left the region-replacement tasks using
`StartSaga::saga_start`, because --- as far as I can tell --- the "fire
and forget" behavior is desirable there. Perhaps @jmpesp can confirm
this?

Closes #6569

[1]: https://github.com/oxidecomputer/omicron/blob/8be99b0c0dd18495d4a98187145961eafdb17d8f/nexus/src/app/saga.rs#L96-L108
@hawkw hawkw closed this as completed in d2e3f34 Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant