Skip to content

Commit

Permalink
Merge pull request #1 from andreyodum/5850_bagit_export2_local_fs
Browse files Browse the repository at this point in the history
Local BagIt Archive Command Added
  • Loading branch information
donsizemore authored Jun 19, 2019
2 parents 430b4ce + 923e5f3 commit 79ed5fb
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 6 deletions.
28 changes: 22 additions & 6 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -658,16 +658,16 @@ For Google Analytics, the example script at :download:`analytics-code.html </_st

Once this script is running, you can look in the Google Analytics console (Realtime/Events or Behavior/Events) and view events by type and/or the Dataset or File the event involves.

DuraCloud/Chronopolis Integration
---------------------------------
BagIt Export
-------------

It's completely optional to integrate your installation of Dataverse with DuraCloud/Chronopolis but the details are listed here to keep the :doc:`/admin/integrations` section of the Admin Guide shorter.
Dataverse can be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to the `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_ or to any folder in the local file path.

Dataverse can be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to the `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_
This integration occurs through customization of an internal Dataverse archiver workflow that can be configured as a PostPublication workflow. An admin API call exists that can manually submit previously published Datasets, and prior versions, to a configured archive such as Chronopolis. The workflow leverages new functionality in Dataverse to create a `JSON-LD <http://www.openarchives.org/ore/0.9/jsonld>`_ serialized `OAI-ORE <https://www.openarchives.org/ore/>`_ map file, which is also available as a metadata export format in the Dataverse web interface.

This integration is occurs through customization of an internal Dataverse archiver workflow that can be configured as a PostPublication workflow to submit the bag to Chronopolis' Duracloud interface using your organization's credentials. An admin API call exists that can manually submit previously published Datasets, and prior versions, to a configured archive such as Chronopolis. The workflow leverages new functionality in Dataverse to create a `JSON-LD <http://www.openarchives.org/ore/0.9/jsonld>`_ serialized `OAI-ORE <https://www.openarchives.org/ore/>`_ map file, which is also available as a metadata export format in the Dataverse web interface.
At present, the DPNSubmitToArchiveCommand and LocalSubmitToArchiveCommand are the only implementations extending the AbstractSubmitToArchiveCommand and using the configurable mechanisms discussed below.

At present, the DPNSubmitToArchiveCommand is the only implementation extending the AbstractSubmitToArchiveCommand and using the configurable mechanisms discussed below.
***Duracloud Configuration***

Also note that while the current Chronopolis implementation generates the bag and submits it to the archive's DuraCloud interface, the step to make a 'snapshot' of the space containing the Bag (and verify it's successful submission) are actions a curator must take in the DuraCloud interface.

Expand Down Expand Up @@ -695,6 +695,22 @@ Archivers may require glassfish settings as well. For the Chronopolis archiver,

``./asadmin create-jvm-options '-Dduracloud.password=YOUR_PASSWORD_HERE'``

***Local Path Configuration***

:ArchiverClassName - the fully qualified class to be used for archiving. For example:

``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand"``

\:BagItLocalPath - the path to where you want to store BagIt. For example:

``curl -X PUT -d /home/path/to/storage http://localhost:8080/api/admin/settings/:BagItLocalPath``

\:ArchiverSettings - the archiver class can access required settings including existing Dataverse settings and dynamically defined ones specific to the class. This setting is a comma-separated list of those settings. For example\:

``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":BagItLocalPath”``

:BagItLocalPath is the file path that you've set in :ArchiverSettings.

**API Call**

Once this configuration is complete, you, as a user with the *PublishDataset* permission, should be able to use the API call to manually submit a DatasetVersion for processing:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package edu.harvard.iq.dataverse.engine.command.impl;

import edu.harvard.iq.dataverse.DOIDataCiteRegisterService;
import edu.harvard.iq.dataverse.DataCitation;
import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetVersion;
import edu.harvard.iq.dataverse.DatasetLock.Reason;
import edu.harvard.iq.dataverse.authorization.Permission;
import edu.harvard.iq.dataverse.authorization.users.ApiToken;
import edu.harvard.iq.dataverse.engine.command.Command;
import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
import edu.harvard.iq.dataverse.engine.command.RequiredPermissions;
import edu.harvard.iq.dataverse.util.bagit.BagGenerator;
import edu.harvard.iq.dataverse.util.bagit.OREMap;
import edu.harvard.iq.dataverse.workflow.step.Failure;
import edu.harvard.iq.dataverse.workflow.step.WorkflowStepResult;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.nio.charset.Charset;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Map;
import java.util.logging.Logger;


import java.io.File;
import java.io.FileOutputStream;

import org.apache.commons.codec.binary.Hex;

import org.apache.commons.io.FileUtils;



@RequiredPermissions(Permission.PublishDataset)
public class LocalSubmitToArchiveCommand extends AbstractSubmitToArchiveCommand implements Command<DatasetVersion> {

private static final Logger logger = Logger.getLogger(LocalSubmitToArchiveCommand.class.getName());

public LocalSubmitToArchiveCommand(DataverseRequest aRequest, DatasetVersion version) {
super(aRequest, version);
}

@Override
public WorkflowStepResult performArchiveSubmission(DatasetVersion dv, ApiToken token, Map<String, String> requestedSettings) {
logger.fine("In LocalCloudSubmitToArchive...");
String localPath = requestedSettings.get(":BagItLocalPath");

try {

Dataset dataset = dv.getDataset();


if (dataset.getLockFor(Reason.pidRegister) == null) {

String spaceName = dataset.getGlobalId().asString().replace(':', '-').replace('/', '-')
.replace('.', '-').toLowerCase();

DataCitation dc = new DataCitation(dv);
Map<String, String> metadata = dc.getDataCiteMetadata();
String dataciteXml = DOIDataCiteRegisterService.getMetadataFromDvObject(
dv.getDataset().getGlobalId().asString(), metadata, dv.getDataset());


FileUtils.writeStringToFile(new File(localPath+"/"+spaceName + "-datacite.v" + dv.getFriendlyVersionNumber()+".xml"), dataciteXml);
BagGenerator bagger = new BagGenerator(new OREMap(dv, false), dataciteXml);
bagger.setAuthenticationKey(token.getTokenString());
bagger.generateBag(new FileOutputStream(localPath+"/"+spaceName + "v" + dv.getFriendlyVersionNumber() + ".zip"));


logger.fine("Localhost Submission step: Content Transferred");
StringBuffer sb = new StringBuffer("file://"+localPath+"/"+spaceName + "v" + dv.getFriendlyVersionNumber() + ".zip");
dv.setArchivalCopyLocation(sb.toString());

} else {
logger.warning("Localhost Submision Workflow aborted: Dataset locked for pidRegister");
return new Failure("Dataset locked");
}
} catch (Exception e) {
logger.warning(e.getLocalizedMessage() + "here");
}
return WorkflowStepResult.OK;
}

}

0 comments on commit 79ed5fb

Please sign in to comment.