Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8889 file-level PIDs configuration in individual collections #9614

Merged
merged 21 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
9e77561
File pids in individual collections (#8889)
landreev May 22, 2023
bc84ed5
change to the configuration logic. #8889
landreev May 23, 2023
4d9e9b1
Guide entry change for the instance-wide setting. #8889
landreev May 23, 2023
e757a77
more guide entries #8889
landreev May 23, 2023
0ec9c73
release note for #8889
landreev May 23, 2023
ff34b46
Added the handling for the file pids attribute to the json parser/pri…
landreev May 23, 2023
82e91f2
a simple restassured test for disabling file pid registration within …
landreev May 23, 2023
60c1031
a couple of extra fixes. #8889
landreev May 24, 2023
14ce446
cosmetic. #8889
landreev May 24, 2023
78d68e2
A bug in the restassured test. #8889
landreev May 24, 2023
2fd36ad
Merge branch 'develop' into 8889-filepids-in-collections
landreev May 24, 2023
ae8b438
A better test for the persistent id. #8889
landreev May 24, 2023
6d06e32
reverting the test check. #8889
landreev May 24, 2023
a556b69
renaming the flyway script 5.13.0.2, hopefully this will make it into…
landreev May 25, 2023
d2325fa
Update doc/sphinx-guides/source/installation/config.rst
landreev May 31, 2023
0704ae2
Update doc/sphinx-guides/source/api/native-api.rst
landreev May 31, 2023
e337871
A simple test for the new collection attributes api; added comments i…
landreev May 31, 2023
d902fed
Merge branch 'develop' into 8889-filepids-in-collections
landreev Jun 20, 2023
4f1afd3
Added an API for registering within a collection. More doc changes. #…
landreev Jun 21, 2023
0e89865
typo. #8889
landreev Jun 22, 2023
59cd7ab
typo. #8889
landreev Jun 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release-notes/8889-filepids-in-collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the [:FilePIDsEnabled](https://guides.dataverse.org/en/latest/installation/config.html#filepidsenabled) section of the Configuration guide for details.
18 changes: 18 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,24 @@ The fully expanded example above (without environment variables) looks like this

curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/guestbookResponses?guestbookId=1 -o myResponses.csv

.. _collection-attributes-api:

Change Collection Attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block::

curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/attribute/$ATTRIBUTE?value=$VALUE

The following attributes are supported:

* ``alias`` Collection alias
* ``name`` Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be useful for @Kris-LIBIS.

In the containerization meetings he has expressed the need to change the name of a collection after installation via API.

* ``description`` Description
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Enables or disables registraion of file-level PIDs in datasets within the collection (overriding the instance-wide setting).
landreev marked this conversation as resolved.
Show resolved Hide resolved


Datasets
--------

Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2756,14 +2756,16 @@ timestamps.
:FilePIDsEnabled
++++++++++++++++

Toggles publishing of file-based PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.

If you don't want to register file-based PIDs for your installation, set:

``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``

Note: File-level PID registration was added in Dataverse Software 4.9; it could not be disabled until Dataverse Software 4.9.3.
landreev marked this conversation as resolved.
Show resolved Hide resolved

It is possible to override the installation-wide setting for specific collections. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

.. _:IndependentHandleService:

:IndependentHandleService
Expand Down
11 changes: 10 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/Dataverse.java
Original file line number Diff line number Diff line change
Expand Up @@ -590,8 +590,17 @@ public void setCitationDatasetFieldTypes(List<DatasetFieldType> citationDatasetF
this.citationDatasetFieldTypes = citationDatasetFieldTypes;
}


@Column(nullable = true)
private Boolean filePIDsEnabled;

public Boolean getFilePIDsEnabled() {
return filePIDsEnabled;
Copy link
Contributor

@poikilotherm poikilotherm May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this code return a default if null? If this should be optional (so this is handled as business logic in service layer), maybe return Optional.ofNullable()?

At least the Javadoc should state what can be expected as return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wanted it to be nullable, and to be null by default; and I wanted the method above to return the actual value (or lack thereof) in the database table. Null value in my implementation has a meaning, of "not specifically configured" one way or another, indicating that the setting of the ancestral collection needs to be consulted instead (or the system-wide default, if none defined on the root collection).
So yes, that was a long way of saying "business logic is elsewhere". I will add a javadoc entry.

}

public void setFilePIDsEnabled(boolean filePIDsEnabled) {
this.filePIDsEnabled = filePIDsEnabled;
}

public List<DataverseFacet> getDataverseFacets() {
return getDataverseFacets(false);
}
Expand Down
67 changes: 66 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@

import edu.harvard.iq.dataverse.util.json.JSONLDUtil;
import edu.harvard.iq.dataverse.util.json.JsonParseException;
import edu.harvard.iq.dataverse.util.json.JsonPrinter;
import static edu.harvard.iq.dataverse.util.json.JsonPrinter.brief;
import java.io.StringReader;
import java.util.Collections;
Expand Down Expand Up @@ -129,6 +130,7 @@
import java.util.Optional;
import java.util.stream.Collectors;
import javax.servlet.http.HttpServletResponse;
import javax.validation.constraints.NotNull;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.StreamingOutput;
Expand Down Expand Up @@ -166,7 +168,7 @@ public class Dataverses extends AbstractApiBean {

@EJB
SwordServiceBean swordService;

@POST
@AuthRequired
public Response addRoot(@Context ContainerRequestContext crc, String body) {
Expand Down Expand Up @@ -590,6 +592,69 @@ public Response deleteDataverse(@Context ContainerRequestContext crc, @PathParam
}, getRequestUser(crc));
}

/**
* Endpoint to change attributes of a Dataverse collection.
*
* @apiNote Example curl command:
* <code>curl -X PUT -d "test" http://localhost:8080/api/dataverses/$ALIAS/attribute/alias</code>
* to change the alias of the collection named $ALIAS to "test".
*/
@PUT
@AuthRequired
@Path("{identifier}/attribute/{attribute}")
public Response updateAttribute(@Context ContainerRequestContext crc, @PathParam("identifier") String identifier,
@PathParam("attribute") String attribute, @QueryParam("value") String value) {
try {
Dataverse collection = findDataverseOrDie(identifier);
User user = getRequestUser(crc);
DataverseRequest dvRequest = createDataverseRequest(user);

// TODO: The cases below use hard coded strings, because we have no place for definitions of those!
// They are taken from util.json.JsonParser / util.json.JsonPrinter. This shall be changed.
// This also should be extended to more attributes, like the type, theme, contacts, some booleans, etc.
switch (attribute) {
case "alias":
collection.setAlias(value);
break;
case "name":
collection.setName(value);
break;
case "description":
collection.setDescription(value);
break;
case "affiliation":
collection.setAffiliation(value);
break;
/* commenting out the code from the draft pr #9462:
case "versionPidsConduct":
CollectionConduct conduct = CollectionConduct.findBy(value);
if (conduct == null) {
return badRequest("'" + value + "' is not one of [" +
String.join(",", CollectionConduct.asList()) + "]");
}
collection.setDatasetVersionPidConduct(conduct);
break;
*/
case "filePIDsEnabled":
collection.setFilePIDsEnabled(parseBooleanOrDie(value));
break;
default:
return badRequest("'" + attribute + "' is not a supported attribute");
}

// Off to persistence layer
execCommand(new UpdateDataverseCommand(collection, null, null, dvRequest, null));

// Also return modified collection to user
return ok("Update successful", JsonPrinter.json(collection));

// TODO: This is an anti-pattern, necessary due to this bean being an EJB, causing very noisy and unnecessary
// logging by the EJB container for bubbling exceptions. (It would be handled by the error handlers.)
} catch (WrappedResponse e) {
return e.getResponse();
}
}

@DELETE
@AuthRequired
@Path("{linkingDataverseId}/deleteLink/{linkedDataverseId}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,7 @@ private boolean runAddReplacePhase1(Dataset owner,
df.setRootDataFileId(fileToReplace.getRootDataFileId());
}
// Reuse any file PID during a replace operation (if File PIDs are in use)
if (systemConfig.isFilePIDsEnabled()) {
if (systemConfig.isFilePIDsEnabledForCollection(owner.getOwner())) {
df.setGlobalId(fileToReplace.getGlobalId());
df.setGlobalIdCreateTime(fileToReplace.getGlobalIdCreateTime());
// Should be true or fileToReplace wouldn't have an identifier (since it's not
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ private void publicizeExternalIdentifier(Dataset dataset, CommandContext ctxt) t
String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, "");
String currentGlobalAuthority = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Authority, "");
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabled();
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabledForCollection(getDataset().getOwner());
// We will skip trying to register the global identifiers for datafiles
// if "dependent" file-level identifiers are requested, AND the naming
// protocol, or the authority of the dataset global id is different from
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ public PublishDatasetResult execute(CommandContext ctxt) throws CommandException
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean registerGlobalIdsForFiles =
(currentGlobalIdProtocol.equals(theDataset.getProtocol()) || dataFilePIDFormat.equals("INDEPENDENT"))
&& ctxt.systemConfig().isFilePIDsEnabled();
&& ctxt.systemConfig().isFilePIDsEnabledForCollection(theDataset.getOwner());

if ( registerGlobalIdsForFiles ){
registerGlobalIdsForFiles = currentGlobalAuthority.equals( theDataset.getAuthority() );
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ protected void executeImpl(CommandContext ctxt) throws CommandException {
// didn't need updating.
String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, "");
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabled();
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabledForCollection(target.getOwner());
// We will skip trying to update the global identifiers for datafiles if they
// aren't being used.
// If they are, we need to assure that there's an existing PID or, as when
Expand Down
27 changes: 24 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import com.ocpsoft.pretty.PrettyContext;
import edu.harvard.iq.dataverse.DataFile;
import edu.harvard.iq.dataverse.Dataverse;
import edu.harvard.iq.dataverse.DataverseServiceBean;
import edu.harvard.iq.dataverse.DvObjectContainer;
import edu.harvard.iq.dataverse.authorization.AuthenticationServiceBean;
Expand Down Expand Up @@ -995,9 +996,29 @@ public boolean isAllowCustomTerms() {
return settingsService.isTrueForKey(SettingsServiceBean.Key.AllowCustomTermsOfUse, safeDefaultIfKeyNotFound);
}

public boolean isFilePIDsEnabled() {
boolean safeDefaultIfKeyNotFound = true;
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, safeDefaultIfKeyNotFound);
public boolean isFilePIDsEnabledForCollection(Dataverse collection) {
if (collection == null) {
return false;
}

Dataverse thisCollection = collection;

// If neither enabled nor disabled specifically for this collection,
// the parent collection setting is inhereted (recursively):
while (thisCollection.getFilePIDsEnabled() == null) {
if (thisCollection.getOwner() == null) {
// We've reached the root collection, and file PIDs registration
// hasn't been explicitly enabled, therefore we presume that it is
// subject to how the registration is configured for the
// entire instance:
return settingsService.isTrueForKey(SettingsServiceBean.Key.FilePIDsEnabled, true);
}
thisCollection = thisCollection.getOwner();
}

// If present, the setting of the first direct ancestor collection
// takes precedent:
return thisCollection.getFilePIDsEnabled();
}

public boolean isIndependentHandleService() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,10 @@ public Dataverse parseDataverse(JsonObject jobj) throws JsonParseException {
}
}
}

if (jobj.containsKey("filePIDsEnabled")) {
dv.setFilePIDsEnabled(jobj.getBoolean("filePIDsEnabled"));
}

/* We decided that subject is not user set, but gotten from the subject of the dataverse's
datasets - leavig this code in for now, in case we need to go back to it at some point
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,9 @@ public static JsonObjectBuilder json(Dataverse dv, Boolean hideEmail) {
if(dv.getStorageDriverId() != null) {
bld.add("storageDriverLabel", DataAccess.getStorageDriverLabelFor(dv.getStorageDriverId()));
}
if (dv.getFilePIDsEnabled() != null) {
bld.add("filePIDsEnabled", dv.getFilePIDsEnabled());
}

return bld;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ private void workflowCompleted(Workflow wf, WorkflowContext ctxt) {
String dataFilePIDFormat = settings.getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean registerGlobalIdsForFiles =
(currentGlobalIdProtocol.equals(ctxt.getDataset().getProtocol()) || dataFilePIDFormat.equals("INDEPENDENT"))
&& systemConfig.isFilePIDsEnabled();
&& systemConfig.isFilePIDsEnabledForCollection(ctxt.getDataset().getOwner());
if ( registerGlobalIdsForFiles ){
registerGlobalIdsForFiles = currentGlobalAuthority.equals( ctxt.getDataset().getAuthority() );
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ALTER TABLE dataverse ADD COLUMN IF NOT EXISTS filePIDsEnabled bool;
73 changes: 73 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/FilesIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import static edu.harvard.iq.dataverse.api.AccessIT.apiToken;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.io.File;
import java.io.IOException;
Expand Down Expand Up @@ -2018,4 +2019,76 @@ public void testDeleteFile() {
.body("data.files[0]", equalTo(null))
.statusCode(OK.getStatusCode());
}

// The following specifically tests file-level PIDs configuration in
// individual collections (#8889/#9614)
@Test
public void testFilePIDsBehavior() {
// Create user
String apiToken = createUserGetToken();

// Create Dataverse
String collectionAlias = createDataverseGetAlias(apiToken);

// Create Initial Dataset with 1 file:
Integer datasetId = createDatasetGetId(collectionAlias, apiToken);
String pathToFile = "scripts/search/data/replace_test/003.txt";
Response addResponse = UtilIT.uploadFileViaNative(datasetId.toString(), pathToFile, apiToken);

addResponse.then().assertThat()
.body("data.files[0].dataFile.contentType", equalTo("text/plain"))
.body("data.files[0].label", equalTo("003.txt"))
.statusCode(OK.getStatusCode());

Long origFileId = JsonPath.from(addResponse.body().asString()).getLong("data.files[0].dataFile.id");

// -------------------------
// Publish dataverse and dataset
// -------------------------
msg("Publish dataverse and dataset");
Response publishCollectionResp = UtilIT.publishDataverseViaSword(collectionAlias, apiToken);
publishCollectionResp.then().assertThat()
.statusCode(OK.getStatusCode());

Response publishDatasetResp = UtilIT.publishDatasetViaNativeApi(datasetId, "major", apiToken);
publishDatasetResp.then().assertThat()
.statusCode(OK.getStatusCode());

// The file in this dataset should have been assigned a PID when it was published:
Response fileInfoResponse = UtilIT.getFileData(origFileId.toString(), apiToken);
fileInfoResponse.then().assertThat().statusCode(OK.getStatusCode());
String fileInfoResponseString = fileInfoResponse.body().asString();
msg(fileInfoResponseString);

String origFilePersistentId = JsonPath.from(fileInfoResponseString).getString("data.dataFile.persistentId");
assertNotNull("The file did not get a persistent identifier assigned (check that file PIDs are enabled instance-wide!)", origFilePersistentId);

// Now change the file PIDs registration configuration for the collection:

Response changeAttributeResp = UtilIT.setCollectionAttribute(collectionAlias, "filePIDsEnabled", "false", apiToken);

// ... And do the whole thing with creating another dataset with a file:

datasetId = createDatasetGetId(collectionAlias, apiToken);
addResponse = UtilIT.uploadFileViaNative(datasetId.toString(), pathToFile, apiToken);
addResponse.then().assertThat().statusCode(OK.getStatusCode());
Long newFileId = JsonPath.from(addResponse.body().asString()).getLong("data.files[0].dataFile.id");

// And publish this dataset:
msg("Publish second dataset");

publishDatasetResp = UtilIT.publishDatasetViaNativeApi(datasetId, "major", apiToken);
publishDatasetResp.then().assertThat()
.statusCode(OK.getStatusCode());

// And confirm that the file didn't get a PID:

fileInfoResponse = UtilIT.getFileData(newFileId.toString(), apiToken);
fileInfoResponse.then().assertThat().statusCode(OK.getStatusCode());
fileInfoResponseString = fileInfoResponse.body().asString();
msg(fileInfoResponseString);

org.junit.Assert.assertEquals("The file was NOT supposed to be issued a PID", "", JsonPath.from(fileInfoResponseString).getString("data.dataFile.persistentId"));

}
}
6 changes: 6 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -2091,6 +2091,12 @@ static Response setDataverseLogo(String dataverseAlias, String pathToImageFile,
.multiPart("file", new File(pathToImageFile))
.put("/api/dataverses/" + dataverseAlias + "/logo");
}

static Response setCollectionAttribute(String dataverseAlias, String attribute, String value, String apiToken) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to have this method and functionality. In future test cases, we could use it in DataversesIT to test changing the name, description, and affiliation of collections.

We COULD do that now, in this PR, if we feel like it. It's new functionality that should probably be exercised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the description, I'm copy-and-pasting/stealing this API from Oliver's pr that's still in draft (#9462). This by itself is going to introduce at least a minor merge conflict with his branch. @poikilotherm are you ok with me expanding on it in this branch, like adding an integration test? - or do you want me to leave it to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added a simple test that uses the API to change the alias then makes sure the change has taken effect)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray for the extra tests! 🎉

Copy link
Contributor

@poikilotherm poikilotherm May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go ahead! I'm happy to resolve any merge conflicts that might appear.

return given()
.header(API_TOKEN_HTTP_HEADER, apiToken)
.put("/api/dataverses/" + dataverseAlias + "/attribute/" + attribute + "?value=" + value);
}

/**
* Deprecated as the apiToken is not used by the call.
Expand Down