Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow workflows to make changes #7568

Merged
merged 50 commits into from
Feb 27, 2021

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jan 31, 2021

What this PR does / why we need it: Prior to this PR, external workflows (e.g. ones using the http s/r step) could not make changes to the dataset. This PR adds a mechanism to use a workflow invocationID as an alternative to a user api key and, when doing so, to allow an API call on behalf of the user to ignore the workflow lock corresponding to the specified workflow invocation. In this sense, the invocation ID becomes the equivalent of a short-term user API key (only valid until the workflow completes). When an API call authenticates the user this way, it's permissions are still limited to those of the user and any action taken is still constrained by any other locks on the dataset.

Which issue(s) this PR closes:

Closes #7563
Closes #7569

Special notes for your reviewer: I added a header to send the workflow invocation ID (and a query param), both analogs of the header/param for the api key. It should be getting checked on any call that checks for the user (via the findUserOrDie() or findAuthenticatedUserOrDie() calls). As discussed in the issue, this is an incremental security improvement - services implementing workflows don't need to have a long-lived api key to work on behalf of users, but it isn't as secure as doing something like presigned-urls which could limit which api calls the workflow could make. Nominally, one could consider this two PRs: enabling use of the invocationID to authenticate to the api, and additionally allowing auth via this method to ignore the lock created by this workflow.

Suggestions on how to test this: Run a pre-publication workflow with a pause step, adding a "parameters" object to the json with "authorized":"true". Get the invocation id from the log. Then, try some API call, such as the one to edit the dataset metadata. (Alternately, setup an http/authext step and a client that would get the invocationID and other info from Dataverse). Using the users API key, it should fail noting the workflow lock. Then use the invocationID and the 'X-dataverse-invocationID' header or ?invocationID= param key and the same call should succeed. One could also verify that the invocationID works for anything else the user has permissions for (e.g. creating/changing a different dataset). One could also add an additional lock to the dataset (any reason code, but if it is for a workflow, it can't have the same invocationID)) and verify that authenticating via invocationID is stopped by the new lock when trying to make changes to the locked dataset.

To test w.r.t. #7569, a workflow with an http s/r step has to be created. Verifying that the lock exists and is visible in the api prior to the initial 200/OK response would verify the fix. It may be easiest to just as DANS to check to see if this works as they have an http s/r step configured.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

and ignore workflow lock if it's yours

Also includes fixes that may be separated for PRs:
Add Finalize lock before invoking FInalize command at end of workflow
module/classpath conflicts showing up in eclipse

ToDo: add flyway script
@qqmyers qqmyers added the GDCC: DANS related to GDCC work for DANS label Jan 31, 2021
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense overall. But this needs to be documented, in workflows.rst, and maybe in native-api.rst too. I was told a few times that all the API functionality must be documented, and that we shouldn't have any unadvertised API calls or parameters, even if it's experimental. And this is something we clearly want other installations to use.
(I'm assuming this is the policy - @scolapasta?)
As for #7569, I haven't tested it myself, but, looking at the code, I'm a little confused about this part:

I had to flush the lock to get it's ID ... which presumably will fix this issue as well.

It just looks like it was already being flushed, as of v5.3, or v4.18.1 that they appear to be running at DANS:

datasetLock.setDataset(ctxt.getDataset());
em.persist(datasetLock);
em.flush();

vs. the corresponding fragment in your branch:

datasetLock.setDataset(ctxt.getDataset());
em.persist(datasetLock);
//flush creates the id
em.flush();
ctxt.setLock(datasetLock);

so the only difference is the ctxt.setLock() line - but that's only setting it in the workflow context... which shouldn't have an affect on whether the lock gets saved in the database (?)
I may still have a question about the security implications of this PR, but let me try and figure it out.
Sorry for the delay with this; I actually knew next to nothing about workflows. I want to understand how they work, since I may need to use a workflow to clean up the spam branch. So I've been educating myself in real time.

@qqmyers
Copy link
Member Author

qqmyers commented Feb 10, 2021

@landreev - thanks! I tend to write documentation last as review can change the code.
W.r.t. the flush - you're right, which explains why this didn't fix the issue. Given that the locking occurs in a method that RequiresNew transaction, I don't understand why calls to the API to get locks don't see the new lock until the overall workflow thread finishes (e.g. after it starts the workflow, runs a send/receive step and that step has replied with a 200 response, and the pendingworkflowinvocation is stored to the db and the thread ends.). I thought RequiresNew would result in changes when the lock method ends...

@landreev
Copy link
Contributor

@qqmyers

which explains why this didn't fix the issue

OK, so you have confirmed, that the lock is really not in the database, until that OK is received? (as opposed to not being seen by the page; and/or other class instance that may be looking at a stale ejb copy of the dataset object?)

@qqmyers
Copy link
Member Author

qqmyers commented Feb 10, 2021

~Yes. DANS created an external test (python/flask) that serves as an http s/r step. When that step is called, it calls the /api/datasets/{id}/locks api (no type parameter so should get all lock types) before sending it's OK/200 response back to dataverse. Seems like a good test, but - checking the code, that call does a findOrDie on the Dataset, so retrieving from the database, but it is looking for the list of locks on the Dataset, versus querying the DatasetLock table. Does that give a clue?

@qqmyers
Copy link
Member Author

qqmyers commented Feb 24, 2021

@landreev, @djbrooke - I've modified this PR to include a new http/authext step which is the only one allowed to use the invocationId as a key, so http/sr - based workflows won't be affected. Also added documentation. (Although I've been testing in concert with other changes recently in #7564, this PR should be ~independent : the one cross-connect with 7564 is in the AuthorizedExternalStep class and documentation for it where I've included the message parameter you can send back. In this PR, it exists but does nothing, and with the PR for #7564, it will get picked up and displayed.)

@djbrooke
Copy link
Contributor

Thanks @qqmyers - the docs look good from my perspective.

@landreev can you take another look when you have a chance? Thx.

@djbrooke djbrooke assigned landreev and unassigned qqmyers Feb 26, 2021
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I like the idea of a dedicated new class of external workflows for this.
I'm approving the PR and moving it along.
A couple of quick questions about the PR description metadata:

@qqmyers
Copy link
Member Author

qqmyers commented Feb 26, 2021

Thanks - I adjusted the test info to reference the http/authext step. (FWIW, it looks like testing with pause is still possible as it echos it's input params (in the json you start the wf with) into the pendingworkflowinvocation class so it can be configured to allow auth via invocationId if desired. (Also noted in the docs that it really is for testing since the invocationId is only in the log/db and not available elsewhere (and you need that to restart the wf).

As for #7569 - I think you're right that the original code also does em.flush() so this PR may not change it. That said, I'm not sure why it wasn't working and the other transaction-related changes could have fixed it, so I'd at least leave it in to test (or have DANS test) and then remove the issue from Closes if it doesn't solve that issue.

@kcondon kcondon self-assigned this Feb 26, 2021
@kcondon kcondon merged commit 105c071 into IQSS:develop Feb 27, 2021
@djbrooke djbrooke added this to the 5.4 milestone Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GDCC: DANS related to GDCC work for DANS
Projects
None yet
5 participants