Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8889 file-level PIDs configuration in individual collections #9614

Merged
merged 21 commits into from
Jun 27, 2023

Conversation

landreev
Copy link
Contributor

@landreev landreev commented May 24, 2023

What this PR does / why we need it:
See issue. Curation team feature request, allows enabling file PIDs in specific collections. Enabling/disabling is via API only.

Which issue(s) this PR closes:

Special notes for your reviewer:

The implementation is straightforward. I added a dedicated boolean flag to the Dataverse entity for this. I don't have any reservations about extending that table, especially since there should be relatively few entries in it.

Note that I copy-and-pasted/stole the new "collection attributes" API from Oliver's #9462, he's adding it in order to be able to configure version PIDs in collections, makes perfect sense to use it for this as well.

I added a simple integration test for the behavior.

One extra thing: there is a Flyway script in the PR, I numbered it 5.13.0.2, under the assumption that it is going to make it into 5.14... but it may need to be renamed as needed.

Suggestions on how to test this:

Straightforward. Have file PIDs disabled instance-wide. Enable them for a specific collection via the API: /api/dataverses/${ALIAS}/attribute/filePIDsEnabled?value=true. Create a dataset with files in the collection, publish, confirm that the files have got PIDs.
(Or the other way around - have it enabled for the instance, then disable it for a specific collection).
Important: needs to be tested on an instance using a real Datacite account. Because "FAKE" does not create file-level DOIs.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@pdurbin pdurbin self-assigned this May 26, 2023
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test this but the code looks good and excited about the main functionality AND the new ability to modify collections via API for alias, name, description, and affiliation. A nice bonus feature.

I suggested some tiny stuff (typo fixes, maybe more tests) so I'll assign this to @landreev to take a look.

DownloadFilesIT.downloadAllFilesRestricted was failing when I got to this PR: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-9614/5/testReport/edu.harvard.iq.dataverse.api/DownloadFilesIT/downloadAllFilesRestricted/

I must have bumped something accidentally triggering a new test run. 😅 I guess we'll see how it ends: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-9614/6/

We should be mindful of the SQL script name, of course. It seems likely we can get this into 5.13 but we'll see.

The following attributes are supported:

* ``alias`` Collection alias
* ``name`` Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be useful for @Kris-LIBIS.

In the containerization meetings he has expressed the need to change the name of a collection after installation via API.

@@ -2091,6 +2091,12 @@ static Response setDataverseLogo(String dataverseAlias, String pathToImageFile,
.multiPart("file", new File(pathToImageFile))
.put("/api/dataverses/" + dataverseAlias + "/logo");
}

static Response setCollectionAttribute(String dataverseAlias, String attribute, String value, String apiToken) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to have this method and functionality. In future test cases, we could use it in DataversesIT to test changing the name, description, and affiliation of collections.

We COULD do that now, in this PR, if we feel like it. It's new functionality that should probably be exercised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the description, I'm copy-and-pasting/stealing this API from Oliver's pr that's still in draft (#9462). This by itself is going to introduce at least a minor merge conflict with his branch. @poikilotherm are you ok with me expanding on it in this branch, like adding an integration test? - or do you want me to leave it to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added a simple test that uses the API to change the alias then makes sure the change has taken effect)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray for the extra tests! 🎉

Copy link
Contributor

@poikilotherm poikilotherm May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go ahead! I'm happy to resolve any merge conflicts that might appear.

doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
landreev and others added 2 commits May 31, 2023 10:40
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

public Boolean getFilePIDsEnabled() {
return filePIDsEnabled;
Copy link
Contributor

@poikilotherm poikilotherm May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this code return a default if null? If this should be optional (so this is handled as business logic in service layer), maybe return Optional.ofNullable()?

At least the Javadoc should state what can be expected as return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wanted it to be nullable, and to be null by default; and I wanted the method above to return the actual value (or lack thereof) in the database table. Null value in my implementation has a meaning, of "not specifically configured" one way or another, indicating that the setting of the ancestral collection needs to be consulted instead (or the system-wide default, if none defined on the root collection).
So yes, that was a long way of saying "business logic is elsewhere". I will add a javadoc entry.

@landreev landreev mentioned this pull request May 31, 2023
27 tasks
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link

📦 Pushed preview application image as

ghcr.io/gdcc/dataverse:8889-filepids-in-collections

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev
Copy link
Contributor Author

landreev commented Jun 2, 2023

Is this PR ready now, or are you still waiting for me to add anything else? - I may have missed something.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@kcondon kcondon self-assigned this Jun 14, 2023
@kcondon
Copy link
Contributor

kcondon commented Jun 14, 2023

Issues found:

  1. Getting 405 when calling the api endpoint: API endpoint does not support this method.
    -This is a doc issue. Shows as a GET but it is a POST
  2. Need to confirm the current flyway script name is appropriate for this 5.14 release, per note in pr
    -Yes, this is fine as long as we're releasing in 5.14
  3. Question: what is the expectation if system setting and collection setting are the same, true/true, false/false, seems obvious but just asking.
    -Collection level setting always wins is the general rule.
  4. Question: how does an admin register file pids for datasets created before file pids was enabled? orig issues mentioned an api endpoint.

@landreev
Copy link
Contributor Author

As to 4. above, it appears that I missed that part. Rather, I just assumed that no new API was needed, that we'd be able to register the file PIDs as needed using the existing API once the registration was enabled... But the only APIs for file level PIDs we have is "register all (unregistered)" and "register an individual file (by id)". While registering all the files in a specific collection can be achieved with the above, it's not practical and we do need an API for registering all the unregistered files in a specific collection.

@landreev landreev self-assigned this Jun 20, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@landreev
Copy link
Contributor Author

There's now an API for minting PIDs for files within a specific collection. GET /api/admin/registerDataFiles/{collection_alias}, described in the "Managing Datasets and Dataverse Collections" guide, with the other file-level registration methods.

@landreev landreev removed their assignment Jun 22, 2023
@kcondon kcondon merged commit 8b4100d into develop Jun 27, 2023
@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:8889-filepids-in-collections
ghcr.io/gdcc/configbaker:8889-filepids-in-collections

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@pdurbin pdurbin added this to the 5.14 milestone Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Enable file PIDs at Dataverse Collection Level
4 participants