1.2.1 Deliverable Backlog Working Notes #9043

mreekie · 2022-10-11T13:36:03Z

Placeholder.
Work on the backlog for this NIH OTA deliverable in the Dataverse Deliverable Backlog Grooming Project

Note:

I realized that doing this; separating these notes from the deliverable was a horrible idea and won't do it going forward.

mreekie · 2022-11-08T18:32:39Z

Moved 8681, 8571 to closed in deliverable backlog project

mreekie · 2022-11-08T19:03:51Z

From the original discussion - jtb asked - When the proposal was written, even the basic mechanisms above did not exist (maybe in development), so reporting their existence to NIH is probably useful. There are a couple presentations on this but no whitepaper or conference paper - is that sort of deliverable useful for NIH?

mreekie · 2022-11-08T20:07:11Z

mreekie commented on Oct 8

We have our first step on this which is the defined Spike.

As I understand the organization of code versus implementation, I think the first step is to engage with @sbarbosadataverse team on Spike: Discovery about integrating biomedical vocabularies into dataverse #8681.
Gather the timeline together so we have context.
Same with the information collected in the spike thus far including the comment regarding hierarchical data support.
Test the vocabularies to see if they can be loaded.
Add to the backlog as needed based on that.

mreekie/pdurbin 27 days ago

Discussion

Julian and Mahmood also are knowledgeable in this. We need to determine how much we have actually delivered on this already.

Jim - we have a mechanism for recording vocabularies.

External vocabularies - javascript gets input from the user. Feeds that string back to Dataverse. There are actually 2 mechanisms. Javascript controls input for a single box on the page.

What happens to get that string is controlled by the javascript. Examples
ORCID
SKOSMOS
Takeaways:

Let's talk about this first with the small group.
Next week Get leonid, Jim, Julian, Stefano, Stephen together to do this.

mreekie 21 days ago.

Very Rough Meeting notes:
We are using the working groups to define the what and then the repositories reflect the how and the when in their individual project plans.

From Jim on the how

For our external vocabulary mechanism, the javascript that we use could be shared accross repositories and then maintained by the maybe FunRaf and ROR. This mechanism is a "plugin"
The idea is to move the "smarts" of the mechanism for populating fields will live in the browswer.
dataverse would store the correct information but would not do the job of validating the information. The responsibility for correctly retrieving the controlled data and populating the fields with the correct controlled vocabulary text would belong to the script

If we use the javascript approach

The groups that provide the vocabulary would provide a javascript plugin.
Currently the 'orchids' of the world provide an API.
This would mean that they would instead provide a javascript plugin.
Maybe we start with providing our javascript and later look to hand off.

Julian

Use Cases.
As a curator or repository manager curating biomedical medical data, I want it to be easy to add the correct metadata to my dataset to that it's easier to find my dataset

e.g. You can store this meta-data now, but it's not easy because you would have to know exactly the text to put in.
We are talking about specific defined vocabularies that already exist.

As a data repository like a Dataverse repository, I want these to be machine readable as well. For example, if there is a user name. Dataverse can be setup to store a simple text field and/or a link or code. Where possible we would like to store a link or unique identifier that remains.

e.g. "Get dataverse out of the business of doing what Orcid does better".
A takeaway here is that there are choices to be made on the implementation side of things in Dataverse that can help Dataverse use more information that improves discoverability.
Backend - whenever there is a machine readable information that we can use, we can store it.
Front end - we might make things look different to the user. Like display the name, but store the reference.
Imagine using a plug-in but for searches it would use the string rather than the machine.
The mechanism would

facet? allows to filter by a concept.
relationship between terms. Show me everything that relates to felines.

We have subjects you can put in.
In the UX - Facet - chemistry - 200 datasets.

From Jim

A series of concrete tasks could be developed to identify the official source of the vocabulary, discover whether there are services available via API to query them, discover if there are existing vocabulary browsers that can serve as examples or provide source code, identify a viable identifier for the terms in the vocabulary, create a simple example browser to demonstrate edit/display in Dataverse, investigation of whether the vocabularies are internationalized and if so, how the translations can be accessed, decide which field(s) in Dataverse they should be applied to, etc.

Jim has also shared a compact and readable document on this topic here

Summary:

We are at the point where we can create deliverables.
There is still work to be done here to flesh out more use cases around controlled vocabularies to be added to the backlog for this feature.
Julian is going to add as a separate comment the last use case that we were trying to put together. This is the one related to searching and how it relates to the entry of URIs instead of text into the db.
- I'm not sure but it sounds like there is a mechanism now for providing language support and that the use of URIs might be a different method?
Leonid is going to create 3 issues, each issue is to create a javascript app in the way that has been discussed in this meeting. Each of the issues applies to one of the 3 vocabularies we've discussed... Unified Medical Language System (UMLS), Center for Expanded Data Annotation and Retrieval (CEDAR), Medical Subject Headings (MeSH)
We will meet again to:
- Review and on the definition of each of the 3 tasks (defintion of done, etc)
- Decide how much further we would like to pusue additional use cases as a team or if that should break off as a separate task.

mreekie 21 days ago

Create a mental model:

review video
find and look at phil demo
read results of Spike issue 8571 - What work has already been done towards support for controlled vocabularies for metadata fields (leonid's notes)
read Jim's notes on the Java script

Julian 21 days ago

Julian is going to add as a separate comment the last use case that we were trying to put together. This is the one related to searching and how it relates to the entry of URIs instead of text into the db.

I agreed that I'd leave a comment here about a use case statement we were working on before we ran out of meeting time:

As a researcher, I need to be sure that the results of my searches in the repository include only the data I'm interested in.
As a curator or repository manager, I'd like the repository to return search results that contain the data that people need when they're searching.

Also, I thought I'd ask a question about those three things being called vocabularies. As far as I can tell, only MeSH is a vocabulary. UMLS is a system "that brings together many health and biomedical vocabularies and standards" and I think one of the listed vocabularies is MeSH, and CEDAR is a platform that makes it easier to create metadata forms in order to improve data submission.

Who recommended these three things? How were they chosen? Was UMLS mentioned because it's a thorough list of biomedical controlled vocabularies?

Edit: Just noticed that what CEDAR is has been pointed out in the document at https://docs.google.com/document/d/1a3S0QkkTtl321XxWkQRCnYCBUR4kxbBwCtgNYTVTBUU.

mreekie 20 days ago

Next step, create our immediate backlog issues to include:

Create a whitepaper
Create an Initial set of use cases to drive user experience.
Leonid is creating the first of the issues - to make the MVP.
The completion of the MVP which will read from FundRef
Identify Issues associated with the backend work which Jim has ID'd

Note: Involve Guillermo in the MVP to get his opinion on the tools.

Julian 20 days ago

About improving searching for datasets by using FundRef in Dataverse repository funder field(s), I'm just noting, as @lenwiz mentioned in the last meeting, that subcommittees of the NIH GREI have the same goal. For our work on the GREI Data Use subcommittee, I mentioned in a Google Doc that there's an issue with the existing funder fields in Dataverse's Citation block (#4859) that I think needs to be addressed.

mreekie · 2022-11-08T21:23:25Z

Today:

Transferred all of our working notes for this over to this sidecar issue
Created a [draft problem statement](https://github.com/IQSS/dataverse/issues/9027t user story as the deliverable description.
Created the first issues based on the outcome from our problem statement meetings on this deliverable.
spoke with Julian about the Fundref creation. He will be a good person to provide a user perspective.
Created a technical document - 1.2.1 Design and implement integration with controlled vocabularies. I'm not sure if it will see more use or not, but it was useful to use as platform for reviewing Jim's notes.

Next steps:

check in with @leonid to see if he created separate issues for this. I'm confident enough in our work to talk to him at this point.
Get this draft problem statement OK'd by Stefano.
Order the backlog with this sub-team
Create end user stakeholder focussed user stories for each of the issues in this backlog.
- work on this until the issues are "Ready" including sizing.
- For the Fundref work make sure we're producing user stories not technical steps
Recruit a technical sponsor
recruit a user sponsor (Julian?)

mreekie · 2023-02-07T13:16:27Z

DO NOT ADD NOTES HERE

mreekie added the NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... label Oct 11, 2022

mreekie changed the title ~~Backl.og Grooming: 1.2.1~~ Backlog Grooming: 1.2.1 Oct 11, 2022

mreekie assigned qqmyers, landreev and scolapasta Oct 13, 2022

mreekie changed the title ~~Backlog Grooming: 1.2.1~~ Backlog Grooming: 1.2.1 (placeholder) Nov 4, 2022

mreekie assigned jggautier and mreekie Nov 8, 2022

sync-by-unito bot mentioned this issue Mar 3, 2023

2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 | IQSS/dataverse-pm#7

Closed

3 tasks

mreekie changed the title ~~Backlog Grooming: 1.2.1 (placeholder)~~ Backlog Grooming: 1.2.1 (Deliverable Sidecar Issue) Nov 8, 2022

mreekie changed the title ~~Backlog Grooming: 1.2.1 (Deliverable Sidecar Issue)~~ 1.2.1 Backlog Grooming Sidecar Issue Nov 10, 2022

mreekie changed the title ~~1.2.1 Backlog Grooming Sidecar Issue~~ 1.2.1 Deliverable Backlog Sidecar Issue Nov 10, 2022

mreekie changed the title ~~1.2.1 Deliverable Backlog Sidecar Issue~~ 1.2.1 Deliverable Backlog Sidecar Nov 10, 2022

mreekie mentioned this issue Nov 15, 2022

Groom NIH OTA related back log Problem Statements #9108

Closed

12 tasks

mreekie changed the title ~~1.2.1 Deliverable Backlog Sidecar~~ 1.2.1 Deliverable Backlog Working Notes Dec 5, 2022

mreekie added the pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc label Mar 20, 2023

cmbz closed this as completed May 17, 2023

cmbz mentioned this issue Feb 1, 2024

Epic: GREI 2 - Consistent Metadata IQSS/dataverse-pm#116

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.2.1 Deliverable Backlog Working Notes #9043

1.2.1 Deliverable Backlog Working Notes #9043

mreekie commented Oct 11, 2022 •

edited

Loading

mreekie commented Nov 8, 2022

mreekie commented Nov 8, 2022

mreekie commented Nov 8, 2022 •

edited

Loading

mreekie commented Nov 8, 2022 •

edited

Loading

mreekie commented Feb 7, 2023

1.2.1 Deliverable Backlog Working Notes #9043

1.2.1 Deliverable Backlog Working Notes #9043

Comments

mreekie commented Oct 11, 2022 • edited Loading

mreekie commented Nov 8, 2022

mreekie commented Nov 8, 2022

mreekie commented Nov 8, 2022 • edited Loading

mreekie commented on Oct 8

mreekie/pdurbin 27 days ago

mreekie 21 days ago.

mreekie 21 days ago

Julian 21 days ago

mreekie 20 days ago

Julian 20 days ago

mreekie commented Nov 8, 2022 • edited Loading

mreekie commented Feb 7, 2023

mreekie commented Oct 11, 2022 •

edited

Loading

mreekie commented Nov 8, 2022 •

edited

Loading

mreekie commented Nov 8, 2022 •

edited

Loading