Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add curation_rule to SSSOM #258

Merged
merged 12 commits into from
Mar 16, 2023
13 changes: 13 additions & 0 deletions examples/schema/curation_rule.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#curie_map:
# HP: http://purl.obolibrary.org/obo/HP_
# MP: http://purl.obolibrary.org/obo/MP_
# orcid: https://orcid.org/
# DISEASE_MAPPING_COMMONS_RULES: https://w3id.org/sssom/commons/disease/curation-rules/
#creator_id: orcid:0000-0002-7356-1779
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#mapping_provider: "https://w3id.org/sssom/core_team"
#comment: This is an example file for the SSSOM for illustration only. Its contents are entirely fabricated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to go as far as saying that we should enforce that the examples given are REAL and not fabricated... Or am I asking too much that people should explain in detail and give real concrete use cases before SSSOM gets polluted with lots of indecipherable fields?

Copy link
Member

@cthoyt cthoyt Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, https://w3id.org/sssom/commons/disease/curation-rules/MPR2 does not resolve to anything, therefore I can not understand what this means, and can not review the merits of this field based on this example

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By fabricated I mean: this is not to be used / maintained for any practical purposes. I am totally out of steam on this one - we can push for incremental improvements moving forward. This example has real curation rules, I just didn't apply them to real data..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried to explain in our last call - PURLs should, but don't have to, resolve to something. Its not part of SSSOM to prescribe what goes in these curation rules. Different communities will decide to create shareable representations, and they will decide to provide resolveable resources and examples. When you review a PR of a mapping set for use, you can, in your organisation, apply whatever quality thresholds you want during the review. On SSSOM metadata level we just say: there is an element to represent curation rules, this is how you do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( I am sorry that my feedback has exhausted you.

we can push for incremental improvements moving forward

You and I both know what this means :p

How can we move forwards so the burden of making good, actionable improvements to SSSOM is distributed from you to community members who are requesting them? E.g., improving the new field request template, adding more CI/CD is a great start.

@saubin78 for a start, can you help alleviate some of this burden? Can you help provide actionable examples of how this might work (e.g., improve the example files Nico made to actually be meaningful examples)? Then, Nico won't feel so much burden from me giving important (but ultimately difficult to address) feedback in addition to the project-based pressure to just "get this done" that if done prematurely, could erode trust and sustainability of SSSOM (and more burn out)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried to explain in our last call - PURLs should, but don't have to, resolve to something. Its not part of SSSOM to prescribe what goes in these curation rules. Different communities will decide to create shareable representations, and they will decide to provide resolveable resources and examples. When you review a PR of a mapping set for use, you can, in your organisation, apply whatever quality thresholds you want during the review. On SSSOM metadata level we just say: there is an element to represent curation rules, this is how you do it.

I don't disagree with this. But I think it's reasonable to ask that people who are making proposals of new fields to go above and beyond the minimum requirements of PURLs to give context to these fields for the purpose of an example. As it is, without a herculean effort of reading through meandering GitHub issue conversations and documentation in various places, it's really not obvious to understand what's going on here. If we can't agree on this, can we at least agree that there should be a detailed explanation in the preamble of the SSSOM document explaining what the new predicates are supposed to be for?

subject_id predicate_id object_id mapping_justification curation_rule see_also
HP:0009124 skos:exactMatch MP:0000003 semapv:ManualMappingCuration DISEASE_MAPPING_COMMONS_RULES:MPR2 https://github.com/mapping-commons/disease-mappings/issues/16
HP:0008551 skos:exactMatch MP:0000018 semapv:ManualMappingCuration DISEASE_MAPPING_COMMONS_RULES:MPR3 https://github.com/mapping-commons/disease-mappings/issues/16
HP:0000411 skos:exactMatch MP:0000021 semapv:ManualMappingCuration DISEASE_MAPPING_COMMONS_RULES:MPR3 https://github.com/mapping-commons/disease-mappings/issues/16
13 changes: 13 additions & 0 deletions examples/schema/curation_rule_text.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#curie_map:
# HP: http://purl.obolibrary.org/obo/HP_
# MP: http://purl.obolibrary.org/obo/MP_
# orcid: https://orcid.org/
# DISEASE_MAPPING_COMMONS_RULES: https://w3id.org/sssom/commons/disease/curation-rules/
#creator_id: orcid:0000-0002-7356-1779
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#mapping_provider: "https://w3id.org/sssom/core_team"
#comment: This is an example file for the SSSOM for illustration only. Its contents are entirely fabricated.
subject_id predicate_id object_id mapping_justification curation_rule_text see_also
HP:0009124 skos:exactMatch MP:0000003 semapv:ManualMappingCuration The two phenotypes inhere in homologous structures and exhibit the same phenotypic quality https://github.com/mapping-commons/disease-mappings/issues/16
HP:0008551 skos:exactMatch MP:0000018 semapv:ManualMappingCuration The two phenotypes inhere in homologous structures and exhibit the same phenotypic quality https://github.com/mapping-commons/disease-mappings/issues/16
HP:0000411 skos:exactMatch MP:0000021 semapv:ManualMappingCuration The two phenotypes are associated with the exact same set of diseases https://github.com/mapping-commons/disease-mappings/issues/16
25 changes: 25 additions & 0 deletions src/sssom_schema/schema/sssom_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,29 @@ slots:
examples:
- value: semapv:Stemming
- value: semapv:StopWordRemoval
curation_rule:
description: A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping.
Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation
rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets.
The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule.
range: EntityReference
multivalued: true
see_also:
- https://github.com/mapping-commons/sssom/issues/166
- https://github.com/mapping-commons/sssom/pull/258
- https://github.com/mapping-commons/sssom/blob/master/examples/schema/curation_rule.sssom.tsv
curation_rule_text:
description: A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping.
Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation
rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets.
The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the
perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text.
range: string
multivalued: true
see_also:
- https://github.com/mapping-commons/sssom/issues/166
- https://github.com/mapping-commons/sssom/pull/258
- https://github.com/mapping-commons/sssom/blob/master/examples/schema/curation_rule_text.sssom.tsv
semantic_similarity_score:
description: A score between 0 and 1 to denote the semantic similarity, where
1 denotes equivalence.
Expand Down Expand Up @@ -530,6 +553,8 @@ classes:
- mapping_tool_version
- mapping_date
- confidence
- curation_rule
- curation_rule_text
- subject_match_field
- object_match_field
- match_string
Expand Down