Query Dataverse for mandatory metadata fields via API #6978

richarda23 · 2020-06-11T10:43:15Z

RSpace ELN uses the Dataverse API to submit research data to Dataverse. It has a minimal UI for metadata fields such as title, subject, description, authors, contacts and this works on various Dataverses till now.

One of our RSpace customers has their own Dataverse as well - 4.19. They have configured Dataverse to require additional metadata when submitting a Dataset. RSpace doesn't know these fields are mandatory and submission fails:

Deposit failed: ERROR2020-06-09T10:58:26ZProcessing failedCouldn't update dataset edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException: Validation Failed: Producer Name is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Distributor Name is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Description Date is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Keyword Term is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Deposit Date is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]).

This corresponds exactly to what are the required properties as sent by the Dataverse admin, that are not set by RSpace

Author - Name
Contact - Name
Contact - Email
Description - Text
Description - Date
Keyword - Term
Producer - Name
Distributor - Name (In our default templates, this is always the name of the (sub-)dataverse. I'm not sure how this should be handled when a dataset is created from RSpace.)
Deposit Date (In Dataverse, this is generated by the system.)

If this list never changes, then RSpace could develop a solution where it reads a list of mandatory fields from a configuration file. But if it does change from time to time, it would be great if there was an API method in Dataverse to get a list of mandatory metadata fields .Then, a client could programmatically generate input fields for these properties so that the end-user could make a valid submission.

The text was updated successfully, but these errors were encountered:

richarda23 · 2020-06-12T08:35:06Z

Just to add, the customer has said that once they have defined their mandatory metadata for a dataverse or subdataverse, it seldom or never changes. So we (RSpace) could just use some static-list-lookup mechanism to handle this particular use case . But being able to retrieve the properties from an API call would be superior as it would always be up-to-date.

djbrooke · 2020-06-12T13:01:51Z

Thanks @richarda23, this is a good idea and makes sense.

djbrooke · 2021-11-09T16:37:09Z

A question mostly for @pdurbin - would the solution implemented in #7942 allow for this? Or no?

pdurbin · 2021-11-09T16:53:03Z

Hmm, from pull request #7942 I believe metadata_fields=citation:* will only show you the citation fields that have been filled in. There are many more citation fields that are not shown. Also, it doesn't indicate if fields are required are not. Good thought though. People might come up with creative uses for that new functionality. 😄

Something else to consider is that templates can require additional fields but I don't know if the API respects this or not. It should if it doesn't. From the original report above it sounds like templates may be in use and enforced via API because Producer Name is not one of the five fields that is usually required.

This issue is related in the sense that it would be nice if the API could return more information about what it needs or allows:

What are the allowed search fields for the Search API q parameter? What are the allowed search fields for the Search API q parameter? #2558

There is an "admin" API that can give some detail on metadata fields but as noted at https://guides.dataverse.org/en/5.8/admin/metadatacustomization.html#exploring-metadata-blocks the output is ugly and it could stand to be cleaned up before it's ready for public consumption:

Here you can see that "title" is required:

$ curl -s http://localhost:8080/api/admin/datasetfield/title | jq .
{
  "status": "OK",
  "data": {
    "name": "title",
    "id": 1,
    "title": "Title",
    "metadataBlock": "citation",
    "fieldType": "TEXT",
    "allowsMultiples": false,
    "hasParent": false,
    "controlledVocabularyValues": [],
    "parentAllowsMultiples": "N/A (no parent)",
    "solrFieldSearchable": "title",
    "solrFieldFacetable": "title_s",
    "isRequired": true,
    "uri": "http://purl.org/dc/terms/title"
  }
}

djbrooke · 2021-11-09T16:59:42Z

Got it - thanks @pdurbin. I was hopeful :)

poikilotherm · 2021-11-09T23:31:50Z

This issue is also relevant to @hermes-hmc, as we might want to validate metadata before depositing instead of try-and-error.

I'm going to add the Hermes label for easier tracking what might be in scope for our project.

Kris-LIBIS · 2021-11-19T14:56:43Z

At KU Leuven we are interested in this as well for future integrations with our other systems. One additional piece of information that would be required to generate valid submissions, is the allowed values for fields with controlled vocabulary. The external vocabularies may make that way more complex, but I hope that would be supported too.

philippconzett · 2021-11-19T15:21:47Z

In a future version of Dataverse where issue 6885 hopefully has been solved, also recommended metadata fields should pop up in the integrated system. I guess in some/many cases, the list of metadata fields can become quite long, as we want our depositors to provide as much metadata as possible. I'm wondering whether users in most (all?) cases anyway would have to navigate to the actual dataset draft in Dataverse and add additional metadata. So, the question is how integration involving metadata registration can be designed in a way that makes the researcher's work as easy as possible.

philippconzett · 2021-11-20T14:09:54Z

I have just had a discussion with @shlake about Dataverse integrations with tools like OSF and RSpace.
The more I think of these kinds of integration, I think the integration needs to go the other way round, that is from Dataverse to OSF, RSpace, etc. This would mean that a user creates a dataset in Dataverse where they can use a dataset/metadata template. When uploading files, they would be able to access tools like OSF and RSpace to select the files they want to upload to the dataset. (In the same way the can – if the integration is activated – select and upload files from Dropbox etc. I know that Harvard/Dataverse want integrations to work the other way round (= from OSF/RSpace to Dataverse), but I think for file upload integrations, it's most convenient for the user to start by creating a dataset WITHIN Dataverse.

Kris-LIBIS · 2021-11-20T17:51:45Z

The university and their researchers have made it very clear that they expect to be able to create the datasets from the institutional repository (Symplectic Elements). We hope to go live in January next year and that will be without that feature. But we will have to implement that somehow in the next year or so. Then there is also a request to migrate datasets from iRODS to Dataverse.

Agreed, we may be able to work around it to some extend, but solving this GitHub issue would surely make it easier to get our integration scenarios working.

richarda23 · 2021-11-22T18:00:02Z

From RSpace perspective, the idea of researchers being able to make deposits from an ELN is solely to lower the barrier to entry to getting data and files and associated metadata (like Orcid IDs, author information) into a repository, and to be able to do so from a familiar software.
We certainly don't intend to replicate the full rich editing experience of every repository we integrate with. If an institution requires a large number of compulsory data fields, could that be implemented as a requirement for publishing the dataset, rather than merely adding content to it?
E.g.

initial submission /creation from RSpace requires minimal data ( enough to satisfy Dataverse's database schema or API validation )
Further metadata / data added in Dataverse UI
Publishing requires compulsory fields to have valid values.

I don't think the counter-proposal of pulling from an ELN into Dataverse is an either / or scenario; both would work depending on what user prefers. But the 'pull from ELN' requires Dataverse to develop UI to browse and configure exports for each and every for ELN or datasource it wants to support.

A 2-step procedure would make it easy for researchers to get started making a deposit in draft form, yet still require full verification in order to publish.

pdurbin · 2021-11-22T18:13:30Z

@richarda23 interesting thought. Please see also this issue:

Edit Dataset: When create draft from files or terms tab, warn after saving that metadata fields are required and fail on publish. Edit Dataset: When create draft from files or terms tab, warn after saving that metadata fields are required and fail on publish. #2451

We already have a concept of a N/A value that has to be replace with a real value before the dataset is published via the GUI. To see this in action:

Create a dataset via SWORD without a subject (the subject will be "N/A" in the database, see "N/A" at https://guides.dataverse.org/en/5.8/api/sword.html )
Edit the dataset metadata in the GUI and try to save. You will be prompted to pick a subject from the controlled vocabulary before you can save.

TaniaSchlatter · 2021-11-23T17:02:46Z

This is a great conversation, and provides specific examples that relate to several issues. Discovery work on #7376 points to possible benefits of a two-step process like what @richarda23 outlines.

#7376 originated from a few questions: "how might we help users add metadata without making it too laborious to publish a dataset?," "how might we make adding and editing metadata more clear?" and "how might we more clearly define what is considered metadata?" We reviewed features and considered deposit/edit workflows. The next step is to mock up UI changes that present metadata required to create a draft more clearly as step 1, and additional, configurable metadata (could be required, recommended, optional as suggested in #6885) as step 2, prior to publishing.

While the need to query for mandatory fields may still be necessary, I wonder if it is possible to instead agree on metadata for creating a draft, and build out/improve how datasets are "enriched".

Kris-LIBIS · 2021-11-25T10:50:26Z

I'm in favour of the 2-step approach and the idea of being able to save a draft dataset with incomplete metadata. Like @richarda23, our aim is to be able to create a dataset from another application and to transfer as much as possible the data and metadata that is already known in the external application. If that metadata does not have to be complete, that would make the integration process much easier. We agree that finalizing the dataset and publishing it should be done within Dataverse.

Still, being able to query Dataverse for the details of the metadata is a plus. It would be helpful in mapping metadata between applications. I assume that the API call should operate on a given Dataverse collection.

richarda23 · 2021-11-25T11:14:18Z

Yes, agree totally. Knowing what is required metadata would also help external app know what metadata to send; there might be some metadata fields that require some computation or straightforward input from user that it would be good to know about at the time of deposition. It would give the external app the best chance of doing a valid submission, which would be good for user. After submission, Dataverse could respond with a boolean indicator of success or a list of required fields that are missing; this could be indicated to the user. It would be nice for user to have immediate feedback that their submission is accepted, valid and ready to be published.

philippconzett · 2021-11-26T07:07:54Z

I guess for "standard" dataverses/collections, the current approach may work fine. For more customized dataverses where e.g. metadata templates with pre-filled fields are used to make deposit as easy as possible, an external tool >> Dataverse integration might be more cumbersome for the depositor. In these cases, I'd prefer to create the dataset within Dataverse, and then - if there is no Dataverse >> external tool integration that allows you to upload the data from the external tool - I'd go back to the external tool and push the data into the created dataset. If I remember correctly, this is how the OSF >> Dataverse integration works, thus you have to choose a specific dataset when you want to push your files into Dataverse.

pdurbin · 2022-10-09T13:50:07Z

for "standard" dataverses/collections, the current approach may work fine... this is how the OSF >> Dataverse integration works, thus you have to choose a specific dataset when you want to push your files into Dataverse.

Right, integrations like OSF, RSpace, OJS, and Renku all assume "standard" collections and only send the five required fields (title, author, subject, description, contact). They don't have any way to query the Dataverse installation to ask if any other fields are required for this or that collection.

Subject is a fixed controlled vocabulary and this old issue is about how you can't query that either. Even though the list is fixed, it should also be queryable so apps like RSpace, etc. don't have to hard code the list:

Expose Controlled Vocabulary Terms in the API #1510

Finally, this older is very similar:

Native API: What fields are required for dataset creation #3060

2023-01-20 update... related:

8822 incomplete datasets via api #8940

pdurbin · 2023-07-13T17:23:42Z

From @Kris-LIBIS:

"we have a dependency on issue #6978 to know which metadata fields are available in dataverse, which are mandatory and what controlled vocabulary valid field values are.

in absence of a solution for the issue above, we submitted PR #8940 and that is merged now and ready for 5.14. The PR will allow the RDM integration tool to create datasets with no metadata at all."

-- https://groups.google.com/g/dataverse-community/c/aGt1ILi1Hf4/m/fnGO-Io_AQAJ

pdurbin · 2023-11-30T17:45:51Z

Hello, all!

@richarda23 and everyone, does the following PR resolve this issue? Should mark it as closing it (on merge)?

JSON Schema creator and validator #10109

Update: I went ahead and marked the PR to close this issue on merge.

landreev added a commit that referenced this issue Jul 28, 2020

still not sure about the formatting... #6978

f71526d

poikilotherm added the HERMES related to @hermes-hmc work on Dataverse code label Nov 9, 2021

pdurbin mentioned this issue Dec 2, 2021

Expose Controlled Vocabulary Terms in the API #1510

Closed

pdurbin added Feature: API User Role: API User Makes use of APIs Hackathon: More APIs Add new or missing API endpoints labels Oct 9, 2022

pdurbin mentioned this issue Oct 10, 2022

Native API: What fields are required for dataset creation #3060

Closed

mreekie added the bk2211 label Nov 1, 2022

mreekie removed the bk2211 label Jan 11, 2023

pdurbin mentioned this issue Jan 18, 2023

8822 incomplete datasets via api #8940

Merged

pdurbin added the Type: Feature a feature request label Oct 9, 2023

pdurbin mentioned this issue Nov 30, 2023

JSON Schema creator and validator #10109

Merged

jp-tosca closed this as completed in #10109 Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Dataverse for mandatory metadata fields via API #6978

Query Dataverse for mandatory metadata fields via API #6978

richarda23 commented Jun 11, 2020

richarda23 commented Jun 12, 2020

djbrooke commented Jun 12, 2020

djbrooke commented Nov 9, 2021

pdurbin commented Nov 9, 2021

djbrooke commented Nov 9, 2021

poikilotherm commented Nov 9, 2021 •

edited

Loading

Kris-LIBIS commented Nov 19, 2021

philippconzett commented Nov 19, 2021

philippconzett commented Nov 20, 2021

Kris-LIBIS commented Nov 20, 2021

richarda23 commented Nov 22, 2021 •

edited

Loading

pdurbin commented Nov 22, 2021

TaniaSchlatter commented Nov 23, 2021 •

edited

Loading

Kris-LIBIS commented Nov 25, 2021

richarda23 commented Nov 25, 2021

philippconzett commented Nov 26, 2021

pdurbin commented Oct 9, 2022 •

edited

Loading

pdurbin commented Jul 13, 2023

pdurbin commented Nov 30, 2023 •

edited

Loading

Query Dataverse for mandatory metadata fields via API #6978

Query Dataverse for mandatory metadata fields via API #6978

Comments

richarda23 commented Jun 11, 2020

richarda23 commented Jun 12, 2020

djbrooke commented Jun 12, 2020

djbrooke commented Nov 9, 2021

pdurbin commented Nov 9, 2021

djbrooke commented Nov 9, 2021

poikilotherm commented Nov 9, 2021 • edited Loading

Kris-LIBIS commented Nov 19, 2021

philippconzett commented Nov 19, 2021

philippconzett commented Nov 20, 2021

Kris-LIBIS commented Nov 20, 2021

richarda23 commented Nov 22, 2021 • edited Loading

pdurbin commented Nov 22, 2021

TaniaSchlatter commented Nov 23, 2021 • edited Loading

Kris-LIBIS commented Nov 25, 2021

richarda23 commented Nov 25, 2021

philippconzett commented Nov 26, 2021

pdurbin commented Oct 9, 2022 • edited Loading

pdurbin commented Jul 13, 2023

pdurbin commented Nov 30, 2023 • edited Loading

poikilotherm commented Nov 9, 2021 •

edited

Loading

richarda23 commented Nov 22, 2021 •

edited

Loading

TaniaSchlatter commented Nov 23, 2021 •

edited

Loading

pdurbin commented Oct 9, 2022 •

edited

Loading

pdurbin commented Nov 30, 2023 •

edited

Loading