Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: DGIdb not being queried correctly #362

Closed
andrewsu opened this issue Dec 3, 2021 · 10 comments
Closed

Bug: DGIdb not being queried correctly #362

andrewsu opened this issue Dec 3, 2021 · 10 comments
Assignees

Comments

@andrewsu
Copy link
Member

andrewsu commented Dec 3, 2021

In NCATSTranslator/testing#151, the query (pasted below) asks for compounds related to NCBIGene:7979 via several predicates, including biolink:decreases_activity_of, with bortezimib as a positive control. the BioThings API for DGIdb contains this link (see https://biothings.ncats.io/dgidb/query?q=subject.NCBIGene:7979, snippet below) but it is not being returned when posting to either https://api.bte.ncats.io/v1/query or https://api.bte.ncats.io/v1/smartapi/e3edd325c76f2992a111b43a907a4870/query. I assume this has to do with the smartAPI x-bte annotations for DGIdb?

Query:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "object": "n0",
                    "subject": "n1",
                    "predicates": [
                        "biolink:entity_negatively_regulates_entity",
                        "biolink:decreases_abundance_of", 
                        "biolink:decreases_expression_of", 
                        "biolink:decreases_stability_of", 
                        "biolink:decreases_uptake_of", 
                        "biolink:increases_degradation_of", 
                        "biolink:decreases_synthesis_of", 
                        "biolink:decreases_activity_of"
                    ]
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:7979"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:SmallMolecule"
                    ]
                }
            }
        }
    }
}

DGIdb snippet:

{
  "took": 0,
  "total": 6,
  "max_score": 8.455176,
  "hits": [
    {
      "_id": "d86ff0fa9ff42d29",
      "_score": 8.455176,
      "association": {
        "edge_label": "decreases_activity_of",
        "interaction_group_score": "0.47",
        "provided_by": "ChemblInteractions",
        "relation_name": "inhibitor"
      },
      "object": {
        "CHEMBL_COMPOUND": "CHEMBL.COMPOUND:CHEMBL325041",
        "id": "CHEMBL.COMPOUND:CHEMBL325041",
        "name": "BORTEZOMIB"
      },
      "subject": {
        "NCBIGene": "7979",
        "SYMBOL": "SEM1",
        "id": "NCBIGene:7979"
      }
    },
@colleenXu
Copy link
Collaborator

colleenXu commented Dec 9, 2021

@andrewsu looks like the x-bte annotations group all of the associations in this api into 1 relationship with the predicate "physically_interacts_with".

It looks like whoever made the parser for this api mapped the relationships in dgidb to biolink predicates (likely an old version of biolink-model).

It looks like there are a LOT of possible meta-triples in this api...I'm not sure if we want to create separate operations for each one...

Notes:

  • x-bte annotation currently has NCBIGene gene <-> Chembl.compound smallmolecule operations
  • looks like there are many original dgidb relationships stored under association.relation_name and biolink model predicate mapping under association.edge_label - and both fields can have more than one relationship.

My review of the API (70,248 documents) + components of meta-triples:

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 3, 2022

Should be addressed by NCATS-Tangerine/translator-api-registry@a3aabc0

However, the query in the original post still won't return the desired data.

I think this is alright after looking at the direction of the data. It looks like Bortezomib -(decreases_activity_of)-> SEM1....and that is what I did in the commit above. This triple wouldn't be returned by the query in the original post, since the predicate would be inverted (activity_decreased_by).

To return this triple, the query would have to be SEM1 -(activity_decreased_by)-> SmallMolecule or SEM1 <-(decreases_activity_of)- SmallMolecule....

@andrewsu
Copy link
Member Author

andrewsu commented Jan 4, 2022

@colleenXu so would you agree that the data presented in https://biothings.ncats.io/dgidb/query?q=subject.NCBIGene:7979%20AND%20object.name:BORTEZOMIB (shown below) is actually backward of what is shown in https://dgidb.org/interactions/a0d25cdf-c8b8-49c6-95ef-88572080f885#_summary and https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL325041/, and therefore this seems to indicate a bug in the dgidb parser?

{
  "took": 0,
  "total": 2,
  "max_score": 14.561249,
  "hits": [
    {
      "_id": "d86ff0fa9ff42d29",
      "_score": 14.561249,
      "association": {
        "edge_label": "decreases_activity_of",
        "interaction_group_score": "0.47",
        "provided_by": "ChemblInteractions",
        "relation_name": "inhibitor"
      },
      "object": {
        "CHEMBL_COMPOUND": "CHEMBL.COMPOUND:CHEMBL325041",
        "id": "CHEMBL.COMPOUND:CHEMBL325041",
        "name": "BORTEZOMIB"
      },
      "subject": {
        "NCBIGene": "7979",
        "SYMBOL": "SEM1",
        "id": "NCBIGene:7979"
      }
    },
    ...
  ]
}

@colleenXu
Copy link
Collaborator

@andrewsu I agree that the biothings API would better match the data (chembl / dgidb links) if the subject was the chemical + the object is the gene. This link is also useful for understanding the relations....it looks like the subject is often the drug and the relations can be organized by "their effect" (inhibitory, activating).

I can edit the current dgidb x-bte annotation with this in mind, let me know....

@andrewsu
Copy link
Member Author

andrewsu commented Jan 5, 2022

@colleenXu can you do some spot checking to see if it seems to be universally true that the subject/object should be reversed (or the predicate reversed)? If so, then yes, I think let's just edit the dgidb x-bte annotation to reflect that, including a prominently-placed comment to note that in an ideal world, the parser should be updated. (I'd normally also suggest creating an issue to update the parser, but given that the file hasn't been updated since May 2021 (https://dgidb.org/downloads) and the source of the parser isn't clear between https://github.com/ravila4/DGIdb and https://github.com/kevinxin90/dgidb, I think the quick and dirty solution here is warranted...)

@colleenXu
Copy link
Collaborator

@andrewsu there is an issue for updating the API / parser open biothings/pending.api#40

@colleenXu
Copy link
Collaborator

noting current parser issues:

  • some chem may have incorrectly imported IDs (have a wikidata ID and a non-matching CHEMBL ID in the object.id field...)
  • the relationships in DGIdb are drug -> gene, but the API is set up with subject as gene and object as chem

@colleenXu
Copy link
Collaborator

@colleenXu
Copy link
Collaborator

@andrewsu the original query should work as intended now...I realized that it's stated as SEM1 <-(predicates)- SmallMolecule.

We now get this record back in knowledge_graph.edges

                "dd961a001791de5ca1d8d8cb99b7dea7": {
                    "predicate": "biolink:decreases_activity_of",
                    "subject": "PUBCHEM.COMPOUND:387447",
                    "object": "NCBIGene:7979",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": [
                                "infores:biothings-explorer"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": [
                                "infores:biothings-dgidb"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        },
                        {
                            "attribute_type_id": "source",
                            "value": "ChemblInteractions"
                        },
                        {
                            "attribute_type_id": "relation",
                            "value": "inhibitor"
                        },
                        {
                            "attribute_type_id": "dgidb_interaction_group_score",
                            "value": "0.47"
                        }
                    ]
                },

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 8, 2022

Here's the response I'm getting now to the original query (SEM1):
SEM1.txt

Screen Shot 2022-01-07 at 9 52 46 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants