Allow external workflows to make changes #7563

qqmyers · 2021-01-28T20:01:39Z

Workflows may need to read/write information in the dataset during their execution. However, for external workflows, i.e. ones that will use the API, this is currently blocked by the lock placed for the workflow itself. Use cases include:
PrePublication:

Adding metadata from an external service (DANS data migration),
Extracting geospatial information and writing a csv file to the dataset (UCLA geo integration),
Future spam or virus check analysis that would need to get file contents

PostPublication:

The archiver workflows set a field in datasetversion to indicate the archivalcopylocation. If this workflow were to be made external, it would have to make that change via an API.

New triggers (TBD):

External ingest apps that read files after upload and produce metadata.

Prepublication is the most critical case - for PostPublish and potential triggers like 'UploadFinished', there's no internal Dataverse processing that has to wait, so workflows could immediately respond with a success message and then process as they wish after the lock is released. For pre-publication, responding would trigger FinalizePublication and the workflow couldn't make changes to go into the published version.

The mechanism for lock enforcement I see in API calls is that locks are only checked within the commands themselves, and different lock types are checked separately.
I can see two options for fixes: either giving the user who started the workflow, or the holder of a token (such as the workflowID), an exemption to avoid being stopped by a workflow lock. (I think all the use cases would work if other lock types are still enforced and I think many would need the protection of those locks (e.g. Ingest, EditInProgress, etc.).

Commands already know which user is calling them, so it would be easy to add an exception in the current lock checks to allow you to run if there's a workflow lock and you are the user. I think this would be OK in practice since any action that would cause conflicts will also trigger additional non-workflow locks and, if the UI stays as is (doesn't exempt a workflow lock) the only potential for conflicting calls would be API calls from the same user (in theory something like running the DCT to change variable metadata while also having triggered a pre-publication workflow that is in progress could result in uncoordinated changes, but real conflicts should (?) still be stopped by other locks versus causing an exception/server error.)
An alternate approach that would be stricter but more work would be to add a way to pass the lockID (short and potentially easy to guess) or workflowID (better - long and unguessable at short timescales) to Commands so that the workflow lock check can verify that the API call originates with the workflow itself. In either case, a way to pass the token to the Command is needed, probably through adding it to the DataverseRequest used to start the Command (DataverseRequest has access to the HttpRequest and should be able to read a param or header). To use the workflowID, another addition to keep the relationship between the Workflow and DatasetLock would be needed.

Unless there are other ideas or a strong opinion, I'm inclined to just go for the latter - it's not too much more work and it is may avoid future problems if long-running apps associated with the user's apikey are also running (external request access tools, QDR's annotation tool (in dev), etc.)

scolapasta · 2021-01-28T20:36:49Z

@qqmyers I lean towards the latter as well.

A variant idea of the latter is to use the notion of a more robust API token system (e.g #7244) and so give the workflow one of these newer one-time or short lived tokens that would have permissions to make changes.

qqmyers · 2021-01-28T21:51:37Z

With some further discussion, I'm even more in favor of the 'latter option' over the 'former'. I like #7244 as a longer-term option as well. For security, I think we need the workflowID (versus the guessable lockId) . I also think that it should work to only send the workflow ID to the workflow implementation, not send the apiKey, and then essentially use the workflowID as both a shorter-lived api key and as proof that you can override the indicated workflow lock. So the proposal is:

Modify dataverse to send the workflow ID in place of the apikey to workflows
Have the workflow send the workflowID in a header when making API calls
Modify dataverse to validate the worfkflowID and identify the associated user (analogous to how api keys are used)
Proceed as is done currently to check that the user has the permissions to take the action for the given api call
When the underlying Command is executed and it checks before starting to see if the relevant Dataset is locked, allow a call using that workflow ID to proceed, ignoring the lock. (And conversely, if someone uses an apiKey to authenticate, do not ignore the lock).

I think this both solves the current issue and tightens up security (the workflow does not get a long-lived key for the user. Instead, when it completes and sends a success/failure response, it invalidates the key that gave it access.)

janvanmansum · 2021-01-29T09:43:32Z

@qqmyers : the solution in your summary above is exactly what we are looking for.

qqmyers added the GDCC: DANS related to GDCC work for DANS label Jan 28, 2021

This was referenced Jan 31, 2021

Allow workflows to make changes #7568

Merged

Workflow locks are not in the db until an http s/r step initially returns #7569

Closed

kcondon closed this as completed in #7568 Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow external workflows to make changes #7563

Allow external workflows to make changes #7563

qqmyers commented Jan 28, 2021 •

edited

Loading

scolapasta commented Jan 28, 2021 •

edited

Loading

qqmyers commented Jan 28, 2021

janvanmansum commented Jan 29, 2021 •

edited

Loading

Allow external workflows to make changes #7563

Allow external workflows to make changes #7563

Comments

qqmyers commented Jan 28, 2021 • edited Loading

scolapasta commented Jan 28, 2021 • edited Loading

qqmyers commented Jan 28, 2021

janvanmansum commented Jan 29, 2021 • edited Loading

qqmyers commented Jan 28, 2021 •

edited

Loading

scolapasta commented Jan 28, 2021 •

edited

Loading

janvanmansum commented Jan 29, 2021 •

edited

Loading