Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Add new static facet to show the metadata blocks types that are populated. #8536

Closed
abujeda opened this issue Mar 25, 2022 · 18 comments · Fixed by #8793
Labels
HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2 HERMES related to @hermes-hmc work on Dataverse code

Comments

@abujeda
Copy link
Contributor

abujeda commented Mar 25, 2022

Overview of the Feature Request
Allow filtering of all datasets that have any value for a particular metadata block.

We have a requirement to facilitate searching and improve visibility of Computation Workflow metadata.
Computational Workflow will be a new metadata block and we would like a way of selecting all datasets of this type.

As well, we would like this new static facet to be the most prominent and to appear at the top.

This new static facet will be configurable. We will allow administrators to decide what metadata blocks should be shown in this new facet.

What kind of user is the feature intended for?
All users

What inspired the request?
As part of the Harvard Data Commons project there is a requirement to tag datasets as computations workflows. This can be done with metadata blocks, but we would like to find these datasets directly from the side panel as we do with the main Dataverses/Datasets/Files checkboxes.

What existing behavior do you want changed?
None

Any brand new behavior do you want to add to Dataverse?
Add a new static search facet to Solr, based on metadata blocks with value for a dataset.

Any related open or closed issues to this feature request?
Relates to #8462 and #8463
This new issue supersedes these previous ones

@abujeda
Copy link
Contributor Author

abujeda commented Mar 25, 2022

The general idea is that a new SearchFields: public static final String METADATA_TYPES = "metadata_type_ss"; will be used to index the metadata blocks that have a value.

Then this new facet, if configured, will be displayed first in the search side panel. Just below the Dataverses, Datasets, files section. The configuration will specify what metadata blocks to show.

@abujeda
Copy link
Contributor Author

abujeda commented Mar 25, 2022

Screenshot 2022-03-25 at 15 17 28

Screenshot 2022-03-25 at 15 17 49

@jggautier
Copy link
Contributor

Hi all. I hope this is an okay place to post questions and concerns about this direction. If not please let me know.

Since the purpose of the new facet category is to make different types of datasets more discoverable, would it make sense to call the new facet category "Dataset types" instead of "Metadata types", and to ensure that each thing listed in that facet is describing a type of dataset, such as "Computational Workflow", "Geospatial dataset"? This would mean that the name of the metadatablock couldn't always be re-used as the name of the facet.

I'm concerned about including a metadatablock for "Software metadata" as part of this effort for supporting computational workflows. If the purpose of the new facet category is to make different types of datasets more discoverable, and the purpose of that "Software Metadata" metadatablock is to let depositors describe software (distinct from computational workflows), then does it make sense to include "Software metadata" in that facet category? I don't think that support for software, and how that's different from support for computational workflows, is something the objective 2 group has discussed or discussed enough, although it has been discussed in the larger Dataverse community.

@abujeda
Copy link
Contributor Author

abujeda commented Mar 28, 2022

Thanks for the comment @jggautier.
Regarding the labels, I have updated the code to change from "Metadata types" to "Dataset types". As well, I have created a new property to set the facet label in the side panel.

Screenshot 2022-03-28 at 11 16 36

Screenshot 2022-03-28 at 11 17 00

@abujeda
Copy link
Contributor Author

abujeda commented Mar 28, 2022

Hi @jggautier, as this dataset type facet for metadata blocks is a new property, we need to add a default value.
These are the default values I have provisionally set for the different metadata blocks in the codebase.
Let me know if you want to change the values.
displayName is the value we use for metadata blocks.
displayFacet is the new one we use for the facet label.

metadatablock.displayName=Astronomy and Astrophysics Metadata
metadatablock.displayFacet=Astronomy and Astrophysics
metadatablock.displayName=Life Sciences Metadata
metadatablock.displayFacet=Life Sciences
metadatablock.displayName=Citation Metadata
metadatablock.displayName=Citation
metadatablock.displayName=Computational Metadata
metadatablock.displayFacet=Computational Workflow
metadatablock.displayName=HBGDki Custom Metadata
metadatablock.displayFacet=HBGDki
metadatablock.displayName=Alliance for Research on Corporate Sustainability Metadata
metadatablock.displayFacet=Alliance for Research on Corporate Sustainability
metadatablock.displayName=CHIA Metadata
metadatablock.displayFacet=CHIA
metadatablock.displayName=Digaai Metadata
metadatablock.displayFacet=Digaai
metadatablock.displayName=Graduate School of Design Metadata
metadatablock.displayFacet=Graduate School of Design
metadatablock.displayName=MRA Metadata
metadatablock.displayFacet=MRA
metadatablock.displayName=PSI Metadata
metadatablock.displayFacet=PSI
metadatablock.displayName=Political Science Replication Initiative Metadata
metadatablock.displayFacet=Political Science Replication Initiative
metadatablock.displayName=Geospatial Metadata
metadatablock.displayFacet=Geospatial
metadatablock.displayName=Journal Metadata
metadatablock.displayFacet=Journal

@jggautier
Copy link
Contributor

Thanks @adaybujeda! And thanks for writing about this in the Slack channel, too, which I think it better since I think more members of the group will see it there and don't monitor this GitHub repo, so I replied there.

@pdurbin
Copy link
Member

pdurbin commented Mar 28, 2022

Which Slack channel are we talking about?

Update: Here's the conversation I was looking for: https://harvard-huit.slack.com/archives/C034VHTHN6Q/p1648464066197569

@poikilotherm poikilotherm added Working Group: SWC HERMES related to @hermes-hmc work on Dataverse code labels Mar 29, 2022
@poikilotherm
Copy link
Contributor

I am very interested in this issue, as it is related to a discussion about software support with @jggautier that involved making Dataverse less data-centric. (see #7077 (comment)) This looks like a great idea and a very nice step forward. (Tagged this issue as related to HERMES and SWC WG accordingly, tagging @atrisovic here)

The CodeMeta block support is happening here: #7844 / PR #7877

I posted this issue to GDCC Slack ig-softwaremd channel just now.

From a technical perspective:

@adaybujeda why would you be using a Solr field metadata_type_ss? There is no need to use a dynamic field for this and make searches more complicated IMHO. Instead, we could create a static filter like we already have for Dataset and Collections.

IMHO when adding support for this, it would be the perfect thing to reuse for #7077. BUT we should discuss the nature of the type then - DataCite metadata allows only for ONE resourceType, not multiple (_ss means multiple).

Maybe the underlying issue is big enough to justify a new DvObject property metadataType?
(This is kind of linked to a discussion I had with @atrisovic and the SWC WG about a crazy idea of changing datasets into composable set of data-, software- and workflowsets to reflect individual parts better.)

@abujeda
Copy link
Contributor Author

abujeda commented Mar 29, 2022

Hi @poikilotherm, thanks for the comments.
I have some answers:

why would you be using a Solr field metadata_type_ss?

I implemented with a dynamic field to facilitate the deployment to existing installations. To avoid having to update the existing Solr schema with the new static field.

The CodeMeta block support is happening here: #7844 / PR #7877

Yes, we are aware. It shows in the screenshot as we are interested in #7844. I was showing the PR to our team.

@abujeda
Copy link
Contributor Author

abujeda commented Mar 29, 2022

Hi @pdurbin, FYI: @jggautier posted the comments into the internal Harvard Data Commons Slack channel for objective 2

@doigl
Copy link
Contributor

doigl commented Mar 29, 2022

We are thinking about something similar to distinguish data and software (i'm still not really clear about, how to distinguish between a computational workflow and software) for an export of the datasets to our uni bibliography: Using the existence of filled in fields in the software metadata block to identify software datasets. But in my view, the much better solution would be to have a way to really distinguish datasets from software (or workflows?) by some direct way (be it a mandatory metadata field in the citation block with a controlled voculary or some deeper solution), as @jggautier proposed.

@sbarbosadataverse
Copy link

@doigl I am interested in that perspective! distinguishing datasets from software/worklows.

@atrisovic
Copy link
Member

@poikilotherm You should come to our Thursday meetings! Your perspective would be very helpful!

For now, we decided to use 'dataset feature' or similar (instead of 'dataset type') to describe that the dataset contains other components such as code/workflows.

@poikilotherm
Copy link
Contributor

@atrisovic : thanks and sure! Happy to talk about HERMES, too, if you're interested.

@atrisovic
Copy link
Member

Yes I am! Also as a side project, I want to incorporate CFF into the GH-DV uploader action!
(I wonder if there is code + xwalk I can reuse when reading in the CFF file, so I'll ask you about that :))

@abujeda
Copy link
Contributor Author

abujeda commented Apr 1, 2022

Hi all, this new feature is not trying to add a dataset type into Dataverse, simply use a new facet category to show all datasets based on metadata blocks.

We are reviewing the label for this new facet category, the name Dataset Type has a particular meaning that we are not sure should be used in this context.

@abujeda
Copy link
Contributor Author

abujeda commented Apr 22, 2022

We have settle on the label for the new static facet "Dataset Feature"

Screenshot 2022-04-22 at 15 25 51

@abujeda
Copy link
Contributor Author

abujeda commented Jun 10, 2022

PR created: #8793

This is ready for review @scolapasta @pdurbin

@scolapasta scolapasta added HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2 labels Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2 HERMES related to @hermes-hmc work on Dataverse code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants