Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8822 incomplete datasets via api #8940

Merged
merged 52 commits into from
May 22, 2023

Conversation

ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented Aug 26, 2022

What this PR does / why we need it:
When integrating GitLab or iRods with Dataverse, we want researchers to be able to send files and just enough information to get the dataset created via API. Publishing is prevented until required fields are entered (using the GUI, most often).

Which issue(s) this PR closes:

Closes #8822

Suggestions on how to test this:
First, you will need to update the solar configuration. For this feature to work, a new solr field is required: "datasetValid". The needed configuration is added in "schema.xml". Notice that when you do not use this feature, updating solr configuration and/or reindexing is not required. Nevertheless, it is reasonable to keep the solr configuration up-to-date and do the configuration update and reindex.

Next, you need to enable these settings:

  • :AllowInvalidMetadataThroughAPI: set this to "true". If you do that without updating the solr configuration first, you will not be able to ingest dataset anymore due to the missing field errors.
  • :CanReviewInvalid: can be set optionally to "true", you can then submit datasets with invalid metadata for review. Otherwise, you will need to correct the metadata before sending to review. Datasets with invalid metadata can never be published, regardless of any setting.
  • :ShowValidityFilter: when set to "true" it enables a filter in "My Data" page (important: this filter will only work correctly when all metadata is reindex on the solr server, otherwise valid datasets will not be shown):

image

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
When the new feature is not used, there are no changes to the user interface. When you use the new feature, you will have the possibility to enable filtering of valid and/or invalid metadata on the "My Data" page, as discussed earlier.

Invalid metadata is marked with a new tag "Invalid metadata":

image

This tag is visible in the "My Data" page and the main page, when logged in. When you do not log in, you can only see published metadata, and since invalid metadata cannot be published, you will not see that tag. The tag, together with a warning, is also shown when viewing a dataset with invalid metadata:

image

A warning is also shown when submitting for review and trying to publish a dataset with invalid metadata:

image

The exact content of the tag and the text of the warning can be modified with the "notValid" and "dataset.message.invalid.warning" properties, respectively.

Is there a release notes update needed for this change?:
Yes.

Additional documentation:
You can also watch the demo as given during the community meeting. You can find the video at the top of DataverseTV: https://dataverse.org/dataversetv

Here's a direct link: https://harvard.zoom.us/rec/share/EZGLPBLPv3o74H_WD-ejp34c3Grro1Jk8mS5L8um9PyFbKzVr2Ro_62gqIkckKf5.1Fh8UfNOIkc8sqpc

@coveralls
Copy link

coveralls commented Aug 26, 2022

Coverage Status

Coverage: 20.337% (+0.08%) from 20.261% when pulling 2479b3d on ErykKul:8822_incomplete_datasets_via_api into 3d8ca99 on IQSS:develop.

@pdurbin
Copy link
Member

pdurbin commented Oct 1, 2022

@mreekie mreekie added the bk2211 label Nov 1, 2022
@poikilotherm poikilotherm self-assigned this May 18, 2023
@poikilotherm
Copy link
Contributor

poikilotherm commented May 18, 2023

Sorry, I dragged this out of "ready for QA" to do some minor cleanup. Thx for bearing with us and keeping up working on this @ErykKul !

Renaming should make it more obvious what this is about.
Also using the API and a (new) UI scope to better distinguish
where a certain setting will be used, instead of making it feature
related (as before).
- Add documentation for the new Info API endpoint
- Add example cURL call
- Create subsubsections for the two distinct cases, making the docs easier to read
Copy link
Contributor

@poikilotherm poikilotherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restructured the JvmSettings a bit, extended the docs and also added an Info API endpoint, so a client may detect if sending incomplete data will be acceptable before trying it.

Looks good to me, off to QA :-)

(One thing though: I'm missing tests for this... There has been no additional API test added with deactivated validation)

@kcondon I see sometimes the CI pipeline for Maven Unit Tests fail - the Test Coverage submission seems to be flaky, tests run OK.

@poikilotherm poikilotherm removed their assignment May 18, 2023
@ErykKul
Copy link
Collaborator Author

ErykKul commented May 19, 2023

@poikilotherm
Great changes! Thanks!

@kcondon kcondon self-assigned this May 19, 2023
@kcondon
Copy link
Contributor

kcondon commented May 19, 2023

Issues found; functionality works, all issues are with doc except one minor edge case issue :

  1. Doc: dataverse.api.allow-incomplete-metadata section links to
    https://guides.dataverse.org/en/8822_incomplete_datasets_via_api/api/native-api.html#create-dataverse-api
    but should probably link to
    https://guides.dataverse.org/en/8822_incomplete_datasets_via_api/api/native-api.html#create-a-dataset-in-a-dataverse-collection I think that's what was intended?

  2. Doc: dataverse.ui.show-validity-filter section points to MyData page in guides:
    When enabled, the filter for validity of metadata is shown in:
    https://guides.dataverse.org/en/8822_incomplete_datasets_via_api/user/account.html#my-data
    but it doesn't show the validity filter enabled on the MyData page? Or was it just saying, hey, I'm talking about the MyData page and if you didn't know what that was, go here?

  3. Doc: In https://guides.dataverse.org/en/8822_incomplete_datasets_via_api/api/native-api.html#create-dataverse-api
    It says, "Alternatively, you can turn off the validation of the dataset by using the optional doNotValidate parameter.
    Providing a .../datasets?doNotValidate=true query parameter turns off the validation of metadata." It does not turn off the validation of the dataset if dataverse.api.allow-incomplete-metadata is not set and solr schema not updated. I may have misunderstood what this was telling me? The way it seems to function is that if dataverse.api.allow-incomplete-metadata=true and solr schema contains the field, datasetValid, then if you include doNotValidate=true parameter, it overrides the usual validation. If you do not include doNotValidate=true then it validates normally. It does not mean without the config set it will bypass validation. I think the sentence beginning with "alternatively" should be removed and then it makes sense.

  4. Function: There is no clickable link in online notifications when a dataset without a title is submitted for review. Email does contain a link to the dataset. This is minor but annoying. We use the title as a link to the dataset in the online notification. Without the title you cannot figure out from the notification where to go not where to click but the email notification does include the link.

  5. Doc: The config option names in the release notes appear to be outdated and do not match the actual names as shown in the guides. Also how to test section option names are outdated. The how to test part was just confusing until I read the doc but the release notes should agree with the doc.

  6. Doc: How to change the incomplete metadata message and tag text is not listed in the release notes nor guides but in the pr notes near screenshots above: "The exact content of the tag and the text of the warning can be modified with the "notValid" and "dataset.message.invalid.warning" properties, respectively." If I understand correctly, it is pointing to the keys in the bundle.properties file(s) that can be updated to change the text in UI? This would be good for an admin to know, I think.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 22, 2023

@kcondon
I have addressed the doc changes you requested. Can you review the latest commits?

I have added a small paragraph in the release notes about the properties that can be set. Would that be sufficient for the admins to configure the feature?

For the edge case(s), I did the following: when a dataset has no title, its global ID would be used as the display name. This would improve linking and displaying information in all places for datasets without a title (also fixing the notifications links).

@kcondon kcondon merged commit 55179da into IQSS:develop May 22, 2023
@kcondon
Copy link
Contributor

kcondon commented May 22, 2023

@ErykKul Thanks for addressing all of the issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API Feature: Metadata Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: No status
Status: Closed
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: Possibility to create & save dataset with incomplete mandatory metadata via API only
8 participants