Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.2.1 Deliverable Backlog Working Notes #9043

Closed
mreekie opened this issue Oct 11, 2022 · 5 comments
Closed

1.2.1 Deliverable Backlog Working Notes #9043

mreekie opened this issue Oct 11, 2022 · 5 comments
Assignees
Labels
NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc

Comments

@mreekie
Copy link

mreekie commented Oct 11, 2022

Placeholder.
Work on the backlog for this NIH OTA deliverable in the Dataverse Deliverable Backlog Grooming Project

Note:

  • I realized that doing this; separating these notes from the deliverable was a horrible idea and won't do it going forward.
@mreekie mreekie added the NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... label Oct 11, 2022
@mreekie mreekie changed the title Backl.og Grooming: 1.2.1 Backlog Grooming: 1.2.1 Oct 11, 2022
@mreekie mreekie changed the title Backlog Grooming: 1.2.1 Backlog Grooming: 1.2.1 (placeholder) Nov 4, 2022
@mreekie
Copy link
Author

mreekie commented Nov 8, 2022

Moved 8681, 8571 to closed in deliverable backlog project

@mreekie
Copy link
Author

mreekie commented Nov 8, 2022

From the original discussion - jtb asked - When the proposal was written, even the basic mechanisms above did not exist (maybe in development), so reporting their existence to NIH is probably useful. There are a couple presentations on this but no whitepaper or conference paper - is that sort of deliverable useful for NIH?

@mreekie
Copy link
Author

mreekie commented Nov 8, 2022

mreekie commented on Oct 8

We have our first step on this which is the defined Spike.


mreekie/pdurbin 27 days ago

Discussion

  • Julian and Mahmood also are knowledgeable in this. We need to determine how much we have actually delivered on this already.

Jim - we have a mechanism for recording vocabularies.

External vocabularies - javascript gets input from the user. Feeds that string back to Dataverse. There are actually 2 mechanisms. Javascript controls input for a single box on the page.

What happens to get that string is controlled by the javascript. Examples
ORCID
SKOSMOS
Takeaways:

  • Let's talk about this first with the small group.
  • Next week Get leonid, Jim, Julian, Stefano, Stephen together to do this.

mreekie 21 days ago.

Very Rough Meeting notes:
We are using the working groups to define the what and then the repositories reflect the how and the when in their individual project plans.

From Jim on the how

  • For our external vocabulary mechanism, the javascript that we use could be shared accross repositories and then maintained by the maybe FunRaf and ROR. This mechanism is a "plugin"
  • The idea is to move the "smarts" of the mechanism for populating fields will live in the browswer.
  • dataverse would store the correct information but would not do the job of validating the information. The responsibility for correctly retrieving the controlled data and populating the fields with the correct controlled vocabulary text would belong to the script

If we use the javascript approach

  • The groups that provide the vocabulary would provide a javascript plugin.
  • Currently the 'orchids' of the world provide an API.
  • This would mean that they would instead provide a javascript plugin.
  • Maybe we start with providing our javascript and later look to hand off.

Julian

Use Cases.
As a curator or repository manager curating biomedical medical data, I want it to be easy to add the correct metadata to my dataset to that it's easier to find my dataset

  • e.g. You can store this meta-data now, but it's not easy because you would have to know exactly the text to put in.
  • We are talking about specific defined vocabularies that already exist.

As a data repository like a Dataverse repository, I want these to be machine readable as well. For example, if there is a user name. Dataverse can be setup to store a simple text field and/or a link or code. Where possible we would like to store a link or unique identifier that remains.

  • e.g. "Get dataverse out of the business of doing what Orcid does better".

  • A takeaway here is that there are choices to be made on the implementation side of things in Dataverse that can help Dataverse use more information that improves discoverability.

  • Backend - whenever there is a machine readable information that we can use, we can store it.

  • Front end - we might make things look different to the user. Like display the name, but store the reference.

  • Imagine using a plug-in but for searches it would use the string rather than the machine.

  • The mechanism would

facet? allows to filter by a concept.
relationship between terms. Show me everything that relates to felines.

We have subjects you can put in.
In the UX - Facet - chemistry - 200 datasets.


From Jim

A series of concrete tasks could be developed to identify the official source of the vocabulary, discover whether there are services available via API to query them, discover if there are existing vocabulary browsers that can serve as examples or provide source code, identify a viable identifier for the terms in the vocabulary, create a simple example browser to demonstrate edit/display in Dataverse, investigation of whether the vocabularies are internationalized and if so, how the translations can be accessed, decide which field(s) in Dataverse they should be applied to, etc.

Jim has also shared a compact and readable document on this topic here


Summary:

  • We are at the point where we can create deliverables.
  • There is still work to be done here to flesh out more use cases around controlled vocabularies to be added to the backlog for this feature.
  • Julian is going to add as a separate comment the last use case that we were trying to put together. This is the one related to searching and how it relates to the entry of URIs instead of text into the db.
    • I'm not sure but it sounds like there is a mechanism now for providing language support and that the use of URIs might be a different method?
  • Leonid is going to create 3 issues, each issue is to create a javascript app in the way that has been discussed in this meeting. Each of the issues applies to one of the 3 vocabularies we've discussed... Unified Medical Language System (UMLS), Center for Expanded Data Annotation and Retrieval (CEDAR), Medical Subject Headings (MeSH)
  • We will meet again to:
    • Review and on the definition of each of the 3 tasks (defintion of done, etc)
    • Decide how much further we would like to pusue additional use cases as a team or if that should break off as a separate task.

mreekie 21 days ago

Create a mental model:


Julian 21 days ago

Julian is going to add as a separate comment the last use case that we were trying to put together. This is the one related to searching and how it relates to the entry of URIs instead of text into the db.

I agreed that I'd leave a comment here about a use case statement we were working on before we ran out of meeting time:

  • As a researcher, I need to be sure that the results of my searches in the repository include only the data I'm interested in.
  • As a curator or repository manager, I'd like the repository to return search results that contain the data that people need when they're searching.

Also, I thought I'd ask a question about those three things being called vocabularies. As far as I can tell, only MeSH is a vocabulary. UMLS is a system "that brings together many health and biomedical vocabularies and standards" and I think one of the listed vocabularies is MeSH, and CEDAR is a platform that makes it easier to create metadata forms in order to improve data submission.

Who recommended these three things? How were they chosen? Was UMLS mentioned because it's a thorough list of biomedical controlled vocabularies?

Edit: Just noticed that what CEDAR is has been pointed out in the document at https://docs.google.com/document/d/1a3S0QkkTtl321XxWkQRCnYCBUR4kxbBwCtgNYTVTBUU.


mreekie 20 days ago

Next step, create our immediate backlog issues to include:

  • Create a whitepaper
  • Create an Initial set of use cases to drive user experience.
  • Leonid is creating the first of the issues - to make the MVP.
  • The completion of the MVP which will read from FundRef
  • Identify Issues associated with the backend work which Jim has ID'd

Note: Involve Guillermo in the MVP to get his opinion on the tools.


Julian 20 days ago

About improving searching for datasets by using FundRef in Dataverse repository funder field(s), I'm just noting, as @lenwiz mentioned in the last meeting, that subcommittees of the NIH GREI have the same goal. For our work on the GREI Data Use subcommittee, I mentioned in a Google Doc that there's an issue with the existing funder fields in Dataverse's Citation block (#4859) that I think needs to be addressed.

@mreekie mreekie changed the title Backlog Grooming: 1.2.1 (placeholder) Backlog Grooming: 1.2.1 (Deliverable Sidecar Issue) Nov 8, 2022
@mreekie
Copy link
Author

mreekie commented Nov 8, 2022

Today:

  • Transferred all of our working notes for this over to this sidecar issue
  • Created a [draft problem statement](https://github.com/IQSS/dataverse/issues/9027t user story as the deliverable description.
  • Created the first issues based on the outcome from our problem statement meetings on this deliverable.
  • spoke with Julian about the Fundref creation. He will be a good person to provide a user perspective.
  • Created a technical document - 1.2.1 Design and implement integration with controlled vocabularies. I'm not sure if it will see more use or not, but it was useful to use as platform for reviewing Jim's notes.

Next steps:

  • check in with @leonid to see if he created separate issues for this. I'm confident enough in our work to talk to him at this point.
  • Get this draft problem statement OK'd by Stefano.
  • Order the backlog with this sub-team
  • Create end user stakeholder focussed user stories for each of the issues in this backlog.
    • work on this until the issues are "Ready" including sizing.
    • For the Fundref work make sure we're producing user stories not technical steps
  • Recruit a technical sponsor
  • recruit a user sponsor (Julian?)

@mreekie mreekie changed the title Backlog Grooming: 1.2.1 (Deliverable Sidecar Issue) 1.2.1 Backlog Grooming Sidecar Issue Nov 10, 2022
@mreekie mreekie changed the title 1.2.1 Backlog Grooming Sidecar Issue 1.2.1 Deliverable Backlog Sidecar Issue Nov 10, 2022
@mreekie mreekie changed the title 1.2.1 Deliverable Backlog Sidecar Issue 1.2.1 Deliverable Backlog Sidecar Nov 10, 2022
@mreekie mreekie changed the title 1.2.1 Deliverable Backlog Sidecar 1.2.1 Deliverable Backlog Working Notes Dec 5, 2022
@mreekie
Copy link
Author

mreekie commented Feb 7, 2023

DO NOT ADD NOTES HERE

@mreekie mreekie added the pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc label Mar 20, 2023
@cmbz cmbz closed this as completed May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc
Projects
None yet
Development

No branches or pull requests

6 participants