want a way for background tasks to await saga completion #6569

hawkw · 2024-09-13T02:39:38Z

Nexus API handlers (and other sagas) that start sagas have the ability to wait for the completion of the saga by awaiting the RunningSaga future returned by RunnableSaga::start. Background tasks, on the other hand, cannot currently do this, as their only interface to the saga executor is an Arc<dyn StartSaga>, which provides only the following saga_start method:

omicron/nexus/src/app/saga.rs

Lines 96 to 108 in 8be99b0

    
           impl StartSaga for SagaExecutor { 
        
               fn saga_start(&self, dag: SagaDag) -> BoxFuture<'_, Result<(), Error>> { 
        
                   async move { 
        
                       let runnable_saga = self.saga_prepare(dag).await?; 
        
                       // start() returns a future that can be used to wait for the saga to 
        
                       // complete.  We don't need that here.  (Cancelling this has no 
        
                       // effect on the running saga.) 
        
                       let _ = runnable_saga.start().await?; 
        
                       Ok(()) 
        
                   } 
        
                   .boxed() 
        
               } 
        
           }

Note that the RunningSaga future is thrown away, meaning that the saga has been started, but there's no way to wait for its completion.

It would be useful for background tasks to be able to wait for the saga's completion. Not being able to do means that background tasks don't really have the ability to enforce meaningful concurrency limits on the sagas they start --- they can only limit the concurrency of starting sagas, not the number of sagas that are concurrently running. It also means that the approach for avoiding Nexus replicas all operating on instances/VMMs/disks in the same order described by @gjcolombo in #6565 (comment) is much harder to implement in cases where the operation that changes the state of a database record so that it is no longer returned by a query requires a saga completing, since the background task cannot easily wait for the sagas it starts to finish before querying the database again.

See also 0582e84

The text was updated successfully, but these errors were encountered:

hawkw · 2024-09-13T02:40:20Z

I'd like to extend the StartSaga interface to permit it to return a saga completion future. Ideally, this would just be the RunningSaga type, but unfortunately, a number of tests currently rely on a NoopStartSaga type that doesn't actually start real sagas, so we'd need to abstract over that somehow...

@jmpesp

Nexus API handlers (and other sagas) that start sagas have the ability to wait for the completion of the saga by `await`ing the `RunningSaga` future returned by `RunnableSaga::start`. Background tasks, on the other hand, cannot currently do this, as their only interface to the saga executor is an `Arc<dyn StartSaga>`, which provides only the [`saga_start` method][1]. This method throws away the `RunningSaga` returned by `RunnableSaga::start`, so the completion of the saga cannot be awaited. In some cases, it's desirable for background tasks to be able to await a saga running to completion. I described some motivations for this in #6569. This commit adds a new `StartSaga::saga_run` method to the `StartSaga` trait, which starts a saga *and* waits for it to finish. Since many tests use a `NoopStartSaga` type which doesn't actually start sagas, this interface still throws away most of the saga *outputs* provided by `StoppedSaga`, to make it easier for the noop test implementation to implement this method. If that's an issue in the future, we can revisit the interface, and maybe make `NoopStartSaga` return fake saga results or something. I've updated the `instance_watcher` and `instance_updater` background tasks to use the `saga_run` method, because they were written with the intent to spawn tasks that run sagas to completion --- this is necessary for how the number of concurrent update sagas is *supposed* to be limited by `instance_watcher`. I left the region-replacement tasks using `StartSaga::saga_start`, because --- as far as I can tell --- the "fire and forget" behavior is desirable there. Perhaps @jmpesp can confirm this? Closes #6569 [1] https://github.com/oxidecomputer/omicron/blob/8be99b0c0dd18495d4a98187145961eafdb17d8f/nexus/src/app/saga.rs#L96-L108

@jmpesp

Nexus API handlers (and other sagas) that start sagas have the ability to wait for the completion of the saga by `await`ing the `RunningSaga` future returned by `RunnableSaga::start`. Background tasks, on the other hand, cannot currently do this, as their only interface to the saga executor is an `Arc<dyn StartSaga>`, which provides only the [`saga_start` method][1]. This method throws away the `RunningSaga` returned by `RunnableSaga::start`, so the completion of the saga cannot be awaited. In some cases, it's desirable for background tasks to be able to await a saga running to completion. I described some motivations for this in #6569. This commit adds a new `StartSaga::saga_run` method to the `StartSaga` trait, which starts a saga *and* waits for it to finish. Since many tests use a `NoopStartSaga` type which doesn't actually start sagas, this interface still throws away most of the saga *outputs* provided by `StoppedSaga`, to make it easier for the noop test implementation to implement this method. If that's an issue in the future, we can revisit the interface, and maybe make `NoopStartSaga` return fake saga results or something. I've updated the `instance_watcher` and `instance_updater` background tasks to use the `saga_run` method, because they were written with the intent to spawn tasks that run sagas to completion --- this is necessary for how the number of concurrent update sagas is *supposed* to be limited by `instance_watcher`. I left the region-replacement tasks using `StartSaga::saga_start`, because --- as far as I can tell --- the "fire and forget" behavior is desirable there. Perhaps @jmpesp can confirm this? Closes #6569 [1]: https://github.com/oxidecomputer/omicron/blob/8be99b0c0dd18495d4a98187145961eafdb17d8f/nexus/src/app/saga.rs#L96-L108

hawkw self-assigned this Sep 13, 2024

hawkw mentioned this issue Sep 13, 2024

[nexus] Add StartSaga::saga_run to await completion #6574

Merged

hawkw closed this as completed in #6574 Sep 16, 2024

hawkw closed this as completed in d2e3f34 Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

want a way for background tasks to await saga completion #6569

want a way for background tasks to await saga completion #6569

hawkw commented Sep 13, 2024

hawkw commented Sep 13, 2024

want a way for background tasks to await saga completion #6569

want a way for background tasks to await saga completion #6569

Comments

hawkw commented Sep 13, 2024

hawkw commented Sep 13, 2024