Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values returned by PFOCR as KP #838

Open
AlexanderPico opened this issue Jul 24, 2024 · 10 comments
Open

Values returned by PFOCR as KP #838

AlexanderPico opened this issue Jul 24, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@AlexanderPico
Copy link
Collaborator

UI team is eager to work with the edge-level pathway information being returned by BTE via PFOCR as a KP. Currently, we just have a flat list of values including the figureUrl and PMCID. Ideally, these would be labeled more clearly or at least returned in sets per hit. And we should also include the pfocrUrl, e.g., https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html

        "predicate": "biolink:occurs_together_in_literature_with",
      	"subject": "CHEBI:173421",
      	"object": "NCBIGene:55869",
      	"attributes": [
          		"attribute_type_id": "biolink:publications",
          		"value": [
            		"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
            		"PMCID:PMC6765066",
			“https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463358/bin/fnagi-09-00176-g003.jpg”,
                        “PMCID:PMC5463358”
          		]
		]
@AlexanderPico AlexanderPico added the enhancement New feature or request label Jul 24, 2024
@AlexanderPico
Copy link
Collaborator Author

To avoid confusion, this is distinct from the analyses returns from PFOCR-based enrichment., which were recently upgraded to include pfocrURL and have a nice structure. Basically, can we do the same upgrade for the edge-level returns?

@newgene
Copy link
Member

newgene commented Jul 24, 2024

@everaldorodrigo put this to your plate. Feel free to create a new issue at pending.api repo and point to this one at the bte repo.

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 25, 2024

Wait....I'm seeing multiple confusing points. Maybe some clarification would be useful?

  1. The first ask seems to be "adjust the info format in edges". This sounds to me like a x-bte annotation/BTE-post-subquery-processing task, not a Pending BioThings API task...
    • I'm also unclear on what the desired format is. It sounds like you'd like all the info for one figure kept together in 1 object (figureUrl, PMC, pfocrUrl), and then having separate objects for each figure? (I'm also need to think about how to put this in TRAPI format in edge-attributes/sources)
  2. The second ask is "adding pfocrUrl to TRAPI edge info". I was covering this in Tweaks to PFOCR's API NCATS-Tangerine/translator-api-registry#132 (comment) and adding pfocrUrl to the TRAPI edge sources section. We should be able to get this done later this week, after Translator Eel Prod deployment.
  3. I don't think PFOCR-based enrichment (result augmentation) has been updated recently and I don't think it has pfocrUrl or any updated structure. This result augmentation has nothing to do with x-bte annotation and PFOCR TRAPI-edge format - which were discussed recently (point 2). I think there's been some crossed-wires/confusion here...

@AlexanderPico
Copy link
Collaborator Author

AlexanderPico commented Jul 25, 2024

Thanks @colleenXu. I will be the first to admit confusion. This is still not totally clear to me, so I'll rephrase my ask from scratch based on what I see today and what I hope to see.

I see these two types of JSON snippets in TRAPI results containing PFOCR content, which I'm going to label Edge and Analyses to distinguish the two distinct parts of the TRAPI result. And I'll include Current and Suggested examples with a Summary of the diff...

1. Analyses
Current:

"pfocr": [
        {"matchedCuries": [
           		 "NCBIGene:8445",
           		 "NCBIGene:1859",
          		 "NCBIGene:2932",
           		 "NCBIGene:2735"
          ],
          "score": 0.2352941176470588,
    "pmc": "PMC2743241",
    "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg"
         },

Suggested:

"pfocr": [
        {"matchedCuries": [
           		 "NCBIGene:8445",
           		 "NCBIGene:1859",
          		 "NCBIGene:2932",
           		 "NCBIGene:2735"
          ],
          "score": 0.2352941176470588,
    "pmc": "PMC2743241",
    "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg",
    "pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
         },

Summary: addition of link to PFOCR website called "pfocrUrl" or whatever you like. I thought this was what we've been discussing for past few months and maybe it's already done?

2. Edge
Current:

        "predicate": "biolink:occurs_together_in_literature_with",
      	"subject": "CHEBI:173421",
      	"object": "NCBIGene:55869",
      	"attributes": [
          		"attribute_type_id": "biolink:publications",
          		"value": [
            		"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
            		"PMCID:PMC6765066",
			“https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463358/bin/fnagi-09-00176-g003.jpg”,
                        “PMCID:PMC5463358”
          		]
		]

Suggested:

        "predicate": "biolink:occurs_together_in_literature_with",
      	"subject": "CHEBI:173421",
      	"object": "NCBIGene:55869",
      	"attributes": [
          		"attribute_type_id": "biolink:publications",
          		"value": [
            		  { "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
            		    "pmc": "PMC6765066",
                            "pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
                          },
                          ...
          		]
	]

Summary: Add structure to separate results as values or at the level of attributes. Also add "pfocrUrl".... Just like "Analyses".

@tokebe
Copy link
Member

tokebe commented Jul 25, 2024

I think some of the prior confusion was likely caused by the fact that PFOCR result augmentation (or analyses, as you call it) is completely separate from edge lookup and doesn't involve x-bte annotation, I think maybe there was some unintended conflation of the two in prior discussion? Either way, I've added your part 1 ask to #837.

BTW, result augmentation is handled by this code.

@AlexanderPico
Copy link
Collaborator Author

Thanks. Yes, I thought the result augmentation was done (or decided) already and was referring to it as an example of the structure and fields we'd like to see in the edge lookup as well.

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 25, 2024

Thanks @AlexanderPico, your post clarifies a lot!

So "Part 1 Analyses" will be tracked/handled in the other issue since it's also "result augmentation".

As for "Part 2 Edges"...let's discuss and track this in this issue.
I had thought we were discussing this the past few months...oops. And based on these discussions, I was planning to make a change after the Translator Eel deployment to add pfocrUrl to the TRAPI edge sources section.

It'd then look like this (click to expand)

                "db7467ffffbf54f21fbe335c46b06303": {
                    "predicate": "biolink:occurs_together_in_literature_with",
                    "subject": "CHEBI:4021",
                    "object": "NCBIGene:208",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg",
                                "PMCID:PMC2743241",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6412134/bin/bgy171f0001.jpg",
                                "PMCID:PMC6412134",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218933/bin/bcr2876-2.jpg",
                                "PMCID:PMC3218933",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7464279/bin/cells-09-01817-g007.jpg",
                                "PMCID:PMC7464279",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209965/bin/cancers-10-00346-g003.jpg",
                                "PMCID:PMC6209965",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3828572/bin/srep03230-f8.jpg",
                                "PMCID:PMC3828572",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4863577/bin/jep-4-173Fig1.jpg",
                                "PMCID:PMC4863577",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7876385/bin/fphar-11-599965-g004.jpg",
                                "PMCID:PMC7876385",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962346/bin/nihms923919f1.jpg",
                                "PMCID:PMC5962346"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        },
                        {
                            "attribute_type_id": "biolink:knowledge_level",
                            "value": "not_provided"
                        },
                        {
                            "attribute_type_id": "biolink:agent_type",
                            "value": "image_processing_agent"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:pfocr",
                            "resource_role": "primary_knowledge_source",
                            "source_record_urls": [
                                "https://pfocr.wikipathways.org/figures/PMC2743241__nihms-104435-f0001.html",
                                "https://pfocr.wikipathways.org/figures/PMC6412134__bgy171f0001.html",
                                "https://pfocr.wikipathways.org/figures/PMC3218933__bcr2876-2.html",
                                "https://pfocr.wikipathways.org/figures/PMC7464279__cells-09-01817-g007.html",
                                "https://pfocr.wikipathways.org/figures/PMC6209965__cancers-10-00346-g003.html",
                                "https://pfocr.wikipathways.org/figures/PMC3828572__srep03230-f8.html",
                                "https://pfocr.wikipathways.org/figures/PMC4863577__jep-4-173Fig1.html",
                                "https://pfocr.wikipathways.org/figures/PMC7876385__fphar-11-599965-g004.html",
                                "https://pfocr.wikipathways.org/figures/PMC5962346__nihms923919f1.html"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-pfocr",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:pfocr"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-pfocr"
                            ]
                        }
                    ]
                }
            }
        },

Notes to self: scattered notes in NCATS-Tangerine/translator-api-registry#132 (comment), #803 (comment), #811 (comment)


However, I agree with your suggestion - that it'd be more useful/UI-friendly/organized to have a list of figure info objects, which each object including all info for 1 figure.

The problem is that your suggestion isn't valid TRAPI/biolink-modeling. The biolink:publications edge-attribute has a specific format: it can be an string or array of strings, and those strings are publication CURIEs.

So we'll need to figure out a format that is TRAPI/biolink-model compliant...which may involve discussions with UI/data-modeling/TRAPI teams.

EDIT: some Slack convos happening.
Our lab Slack

@colleenXu
Copy link
Collaborator

@AlexanderPico

I can make the change mentioned above to add pfocrUrl to the TRAPI edge sources section, now that Translator Eel is in Prod. Would you like me to do this? Or pause/drop this effort?

@AlexanderPico
Copy link
Collaborator Author

Yes, please! I think we'll want that long-term. Short-term, we might be stuffing this edge info into a support graph section so that the UI team can access it right away (i.e, before alt edge types are allowed).

@colleenXu
Copy link
Collaborator

Okay, the minor change mentioned above should be live tomorrow (8/1). merged PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants