Skip to content

Commit

Permalink
Merge pull request IQSS#4302 from IQSS/3700-export-schema.org
Browse files Browse the repository at this point in the history
implement export of schema.org JSON-LD IQSS#3700
  • Loading branch information
kcondon authored Nov 29, 2017
2 parents a881f36 + 3cc02d0 commit d785c5c
Show file tree
Hide file tree
Showing 7 changed files with 395 additions and 11 deletions.
9 changes: 7 additions & 2 deletions doc/sphinx-guides/source/admin/metadataexport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@ Metadata Export
Automatic Exports
-----------------

Unlike in DVN v3, publishing a dataset in Dataverse 4 automaticalliy starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats (Dublin Core, Data Documentation Initiative (DDI), and native JSON). There is no need to run the export manually.
Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats:

- Dublin Core
- Data Documentation Initiative (DDI)
- Schema.org JSON-LD
- native JSON (Dataverse-specific)

A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the "Application Timers" section of this guide for more information)

Expand All @@ -28,4 +33,4 @@ Note, that creating, modifying, or re-exporting an OAI set will also attempt to
Export Failures
---------------

An export batch job, whether started via the API, or by the application timer, will leave a detailed log in your configured logs directory. This is the same location where your main Glassfish server.log is found. The name of the log file is ``export_[timestamp].log`` - for example, *export_2016-08-23T03-35-23.log*. The log will contain the numbers of datasets processed successfully and those for which metadata export failed, with some information on the failures detected. Please attach this log file if you need to contact Dataverse support about metadata export problems.
An export batch job, whether started via the API, or by the application timer, will leave a detailed log in your configured logs directory. This is the same location where your main Glassfish server.log is found. The name of the log file is ``export_[timestamp].log`` - for example, *export_2016-08-23T03-35-23.log*. The log will contain the numbers of datasets processed successfully and those for which metadata export failed, with some information on the failures detected. Please attach this log file if you need to contact Dataverse support about metadata export problems.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Delete the dataset whose id is passed::

GET http://$SERVER/api/datasets/export?exporter=ddi&persistentId=$persistentId

.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, and ``dataverse_json``.
.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , and ``dataverse_json``.

|CORS| Lists all the file metadata, for the given dataset and version::

Expand Down
1 change: 1 addition & 0 deletions src/main/java/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -1107,6 +1107,7 @@ dataset.editBtn.itemLabel.deaccession=Deaccession Dataset
dataset.exportBtn=Export Metadata
dataset.exportBtn.itemLabel.ddi=DDI
dataset.exportBtn.itemLabel.dublinCore=Dublin Core
dataset.exportBtn.itemLabel.schemaDotOrg=Schema.org JSON-LD
dataset.exportBtn.itemLabel.json=JSON
metrics.title=Metrics
metrics.title.tip=View more metrics information
Expand Down
11 changes: 10 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.RestrictFileCommand;
import edu.harvard.iq.dataverse.engine.command.impl.ReturnDatasetToAuthorCommand;
import edu.harvard.iq.dataverse.engine.command.impl.SubmitDatasetForReviewCommand;
import edu.harvard.iq.dataverse.export.SchemaDotOrgExporter;
import java.util.Collections;

import javax.faces.event.AjaxBehaviorEvent;
Expand Down Expand Up @@ -4068,7 +4069,15 @@ public boolean isThisLatestReleasedVersion() {

public String getJsonLd() {
if (isThisLatestReleasedVersion()) {
return workingVersion.getJsonLd();
ExportService instance = ExportService.getInstance(settingsService);
String jsonLd = instance.getExportAsString(dataset, SchemaDotOrgExporter.NAME);
if (jsonLd != null) {
logger.fine("Returning cached schema.org JSON-LD.");
return jsonLd;
} else {
logger.fine("No cached schema.org JSON-LD available. Going to the database.");
return workingVersion.getJsonLd();
}
}
return "";
}
Expand Down
17 changes: 10 additions & 7 deletions src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java
Original file line number Diff line number Diff line change
Expand Up @@ -1211,12 +1211,8 @@ public String getPublicationDateAsString() {
return r;
}

// TODO: Make this more performant by writing the output to the database or a file?
// Agree - now that this has grown into a somewhat complex chunk of formatted
// metadata - and not just a couple of values inserted into the page html -
// it feels like it would make more sense to treat it as another supported
// export format, that can be produced once and cached.
// The problem with that is that the export subsystem assumes there is only
// TODO: Consider moving this comment into the Exporter code.
// The export subsystem assumes there is only
// one metadata export in a given format per dataset (it uses the current
// released (published) version. This JSON fragment is generated for a
// specific released version - and we can have multiple released versions.
Expand Down Expand Up @@ -1244,6 +1240,9 @@ public String getJsonLd() {
// We are aware of "givenName" and "familyName" but instead of a person it might be an organization such as "Gallup Organization".
//author.add("@type", "Person");
author.add("name", name);
// We are aware that the following error is thrown by https://search.google.com/structured-data/testing-tool
// "The property affiliation is not recognized by Google for an object of type Thing."
// Someone at Google has said this is ok.
if (!StringUtil.isEmpty(affiliation)) {
author.add("affiliation", affiliation);
}
Expand Down Expand Up @@ -1341,7 +1340,11 @@ public String getJsonLd() {
if (TermsOfUseAndAccess.License.CC0.equals(terms.getLicense())) {
license.add("text", "CC0").add("url", "https://creativecommons.org/publicdomain/zero/1.0/");
} else {
license.add("text", terms.getTermsOfUse());
String termsOfUse = terms.getTermsOfUse();
// Terms of use can be null if you create the dataset with JSON.
if (termsOfUse != null) {
license.add("text", termsOfUse);
}
}

job.add("license",license);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package edu.harvard.iq.dataverse.export;

import com.google.auto.service.AutoService;
import edu.harvard.iq.dataverse.DatasetVersion;
import edu.harvard.iq.dataverse.export.spi.Exporter;
import edu.harvard.iq.dataverse.util.BundleUtil;
import java.io.IOException;
import java.io.OutputStream;
import java.io.StringReader;
import java.util.logging.Logger;
import javax.json.Json;
import javax.json.JsonObject;
import javax.json.JsonReader;

@AutoService(Exporter.class)
public class SchemaDotOrgExporter implements Exporter {

private static final Logger logger = Logger.getLogger(SchemaDotOrgExporter.class.getCanonicalName());

public static final String NAME = "schema.org";

@Override
public void exportDataset(DatasetVersion version, JsonObject json, OutputStream outputStream) throws ExportException {
String jsonLdAsString = version.getJsonLd();
StringReader stringReader = new StringReader(jsonLdAsString);
JsonReader jsonReader = Json.createReader(stringReader);
JsonObject jsonLdJsonObject = jsonReader.readObject();
try {
outputStream.write(jsonLdJsonObject.toString().getBytes("UTF8"));
} catch (IOException ex) {
logger.info("IOException calling outputStream.write: " + ex);
}
try {
outputStream.flush();
} catch (IOException ex) {
logger.info("IOException calling outputStream.flush: " + ex);
}
}

@Override
public String getProviderName() {
return NAME;
}

@Override
public String getDisplayName() {
return BundleUtil.getStringFromBundle("dataset.exportBtn.itemLabel.schemaDotOrg");
}

@Override
public Boolean isXMLFormat() {
return false;
}

@Override
public Boolean isHarvestable() {
// Defer harvesting because the current effort was estimated as a "2": https://github.com/IQSS/dataverse/issues/3700
return false;
}

@Override
public Boolean isAvailableToUsers() {
return true;
}

@Override
public String getXMLNameSpace() throws ExportException {
throw new ExportException(SchemaDotOrgExporter.class.getSimpleName() + ": not an XML format.");
}

@Override
public String getXMLSchemaLocation() throws ExportException {
throw new ExportException(SchemaDotOrgExporter.class.getSimpleName() + ": not an XML format.");
}

@Override
public String getXMLSchemaVersion() throws ExportException {
throw new ExportException(SchemaDotOrgExporter.class.getSimpleName() + ": not an XML format.");
}

@Override
public void setParam(String name, Object value) {
// this exporter doesn't need/doesn't currently take any parameters
}

}
Loading

0 comments on commit d785c5c

Please sign in to comment.