update pending dgidb API #40

colleenXu · 2021-12-16T01:04:37Z

BTE is using https://pending.biothings.io/dgidb quite a bit, and I believe it hasn't been updated since it was made (in 2018/2019?).

We are interested in updating BOTH the parser + data for this API. This API is using the biolink-model predicates in the association.edge-label field....so we may want to remove this or map the dgidb relations to the most recent version of the biolink-model.

andrewsu · 2021-12-16T01:16:16Z

Before refreshing our pending API, @colleenXu, can you look at whether we can annotate their API for use within BTE directly? https://www.dgidb.org/api

colleenXu · 2021-12-28T22:56:13Z

We could theoretically annotate the interactions endpoint with x-bte annotation, but there are some issues...

the genes field can accept Entrez IDs (not just gene symbols / names). However, the response will be under the matchedTerms field or ambiguousTerms field, depending on how their API was able to match the search term (and I don't see a clear pattern for when it'll do one vs the other). At the moment, to handle looking at two different response-mappings, we'd need duplicate operations / sub-queries...
the drugs field expects names. I've tried inputting chembl ID in different ways and haven't been successful in getting the record out.....so I think the "reverse operation" of drugs -> genes won't work...
This API is slower than a BioThings API would be (takes ~ 7 sec to return 1 gene's interactions) and likely has less capacity for batch-querying (it can accept a comma-delimited list for a GET parameter and the documentation says POST is possible)

colleenXu · 2022-01-10T20:21:05Z

noting current parser issues:

some chem may have incorrectly imported IDs (have a wikidata ID and a non-matching CHEMBL ID in the object.id field...)
the relationships in DGIdb are drug -> gene, but the API is set up with subject as gene and object as chem

erikyao · 2022-04-07T06:45:18Z

Hi @colleenXu, could you please double check the mapping from DGIdb interaction types to Biolink predicates?

Within the DGIdb Feb 2022 interactions.tsv, the interaction_types column has values like inhibitor, blocker, etc. My understanding is that, e.g. with blocker, we can find in biolink-model.yaml that DGIdb:blocker is one of the narrow_mappings of predicate decreases activity of. Therefore we map the blocker interaction type to decreases_activity_of (whitespaces replaced by underscores) predicate.

Please correct me if I am wrong.

colleenXu · 2022-04-07T19:23:56Z

@erikyao @andrewsu My understanding is that we don't need to map from DGIdb interaction types to Biolink predicates in the API....this can be handled in the x-bte annotation.

In the past, Kevin had included this mapping in the parser for some reason...

erikyao · 2022-04-08T23:05:44Z

~~Hi @andrewsu @colleenXu , the latest DGIdb Feb 2022 interactions.tsv has a new column interaction_group_score, which is actually the "Interaction Score" as explained here.~~

~~E.g. the interaction between BMS-387032 and CDK7 has a score of 0.82, as shown here.~~

~~Shall we integrate these scores into a field, say association.score? Thanks!~~

Never mind. The scores were already parsed, the comments in parser are outdated.

erikyao · 2022-04-08T23:26:56Z

Hi @colleenXu, w.r.t.

some chem may have incorrectly imported IDs (have a wikidata ID and a non-matching CHEMBL ID in the object.id field...)

I noticed that in the tsv file, the column drug_concept_id contains:

empty values
wikidata IDs in the form of wikidata:Q<num>, such as wikidata:Q419808 for SAPROPTERIN
chembl IDs in the form of chembl:CHEMBL<num>, such as chembl:CHEMBL942 for BISACODYL

I'll query MyChem with drug names for chembl IDs for case 1 and case 2, to solve the wikidata problem.

Could you explain more on a non-matching CHEMBL ID in the object.id field? Thank you!

W.r.t.

the relationships in DGIdb are drug -> gene, but the API is set up with subject as gene and object as chem

I'll swap object and subject fields in the parser. The column-to-field relation would be like:

Column Index	Column Name	Key Name
0	gene_name	object.SYMBOL
1	gene_claim_name
2	entrez_id	object.NCBIGene
3	interaction_claim_source	association.provided_by
4	interaction_types	association.relation_name
5	drug_claim_name
6	drug_claim_primary_name
7	drug_name	subject.name
8	drug_concept_id	subject.CHEMBL_COMPOUND
9	interaction_group_score	association.interaction_group_score
10	PMIDs	association.pubmed

colleenXu · 2022-04-10T06:02:51Z

On the wikidata vs chembl.compound IDs

I don't quite remember what was going on, and I can't find an example in the current API. My guess is that every record in the current API has an object.CHEMBL_COMPOUND field, and I found some wikidata IDs in that field for some records...

To clarify:

I'm not sure what you mean by "ensembl ids" here. It sounds like you're dealing with CHEMBL IDs? specifically CHEMBL.COMPOUND IDs? And you'd query MyChem rather than MyGene?
are you mapping drug names (when the drug_concept_id field is empty) and drug wikidata IDs (when the drug_concept_id is a wikidata ID) to CHEMBL.COMPOUND ids - so the final biothings API only has chembl.compound IDs for chemicals? that'd be awesome

the column-to-field table

I'd like to keep the tsv column names in some cases....

column index 3: association.interaction_claim_source
column index 4: association.interaction_types
column 7: subject.drug_name
column 10: association.pmids

colleenXu · 2022-04-10T06:06:01Z

Also, my understanding has been that:

subject.id has the prefix-id combo (CHEMBL.COMPOUND:1234) while subject.CHEMBL_COMPOUND would only have the ID (1234)
same with object.id (NCBIGene:1234) vs object.NCBIGene (1234)

The current parser keeps the CHEMBL.COMPOUND prefix on both fields of the object (id + CHEMBL_COMPOUND) right now...

(note: I don't know where the convention above comes from. I get the sense that it's a biothings api / our lab thing. BTE used to need prefixes/no-prefixes on certain ID namespaces....but it seems to be doing okay right now...)

erikyao · 2022-04-11T19:11:54Z

I'm not sure what you mean by "ensembl ids" here. It sounds like you're dealing with CHEMBL IDs? specifically CHEMBL.COMPOUND IDs? And you'd query MyChem rather than MyGene?

Sorry it's a typo. I mean chembl ids actually. And it's MyChem.

are you mapping drug names (when the drug_concept_id field is empty) and drug wikidata IDs (when the drug_concept_id is a wikidata ID) to CHEMBL.COMPOUND ids - so the final biothings API only has chembl.compound IDs for chemicals? that'd be awesome

yes, that's what I am going to do.

The current parser keeps the CHEMBL.COMPOUND prefix on both fields of the object (id + CHEMBL_COMPOUND) right now...

I can fix this issue as well.

erikyao · 2022-04-11T19:15:05Z

@colleenXu, and since we are going to remove the mapped predicates, the association.edge_label field is to be removed. Originally we have set it at parser.py#L122

erikyao · 2022-04-12T05:47:43Z

https://biothings.ncats.io/dgidb updated

colleenXu · 2022-04-13T05:36:20Z

@erikyao it looks like the field name suggestions here weren't addressed...what were your thoughts?

colleenXu · 2022-04-13T05:59:53Z

@erikyao also, most of the API records seem to not have an association.relation_name field....it would help to give these records some kind of value. Right now BTE cannot get these associations...

I see "n/a" and "other/unknown" interaction types in https://dgidb.org/interaction_types so it would help to maybe keep these values. Maybe by having the values not_applicable and other_unknown....

colleenXu · 2022-04-13T06:52:09Z

updated x-bte annotation for dgidb. still have the issues noted above NCATS-Tangerine/translator-api-registry@0307491

erikyao · 2022-04-13T23:28:25Z

the column-to-field table

I'd like to keep the tsv column names in some cases....

column index 3: association.interaction_claim_source

column index 4: association.interaction_types

column 7: subject.drug_name

column 10: association.pmids

Revised as suggested

erikyao · 2022-04-13T23:29:14Z

@erikyao also, most of the API records seem to not have an association.relation_name field....it would help to give these records some kind of value. Right now BTE cannot get these associations...

I see "n/a" and "other/unknown" interaction types in https://dgidb.org/interaction_types so it would help to maybe keep these values. Maybe by having the values not_applicable and other_unknown....

not_applicable is used for empty interactions

colleenXu · 2022-04-15T05:11:05Z

These are the interaction_types in the raw data. I've used 12 unique interaction_type values for x-bte annotations (24 operations total for forward + reverse). NCATS-Tangerine/translator-api-registry@5ac52b7

activator
agonist
allosteric_modulator
antagonist
antibody
blocker
inhibitor
inverse_agonist
modulator
not_applicable (as mentioned above, a lot of rows had no value for interaction_types...)
partial_agonist
positive_modulator

For the other interaction_types in the data, I write x-bte annotation and commented it out. this is because there were only a few records for the operation...

antisense_oligonucleotide: 4 records
inducer: 1 record
inhibitory_allosteric_modulator: 1 record
negative_modulator: 5 records
suppressor: 1 record
vaccine: 8 records

colleenXu mentioned this issue Jan 5, 2022

Bug: DGIdb not being queried correctly biothings/biothings_explorer#362

Closed

erikyao self-assigned this Jan 21, 2022

colleenXu mentioned this issue Feb 11, 2022

find/add non-Translator APIs to BTE biothings/biothings_explorer#372

Open

erikyao mentioned this issue Apr 12, 2022

2022-Feb data release (version 4.2.0) and bug fixes biothings/DGIdb#1

Closed

erikyao closed this as completed Apr 12, 2022

colleenXu mentioned this issue Apr 13, 2022

Updating x-bte annotations biothings/biothings_explorer#357

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update pending dgidb API #40

update pending dgidb API #40

colleenXu commented Dec 16, 2021

andrewsu commented Dec 16, 2021

colleenXu commented Dec 28, 2021 •

edited

Loading

colleenXu commented Jan 10, 2022 •

edited

Loading

erikyao commented Apr 7, 2022

colleenXu commented Apr 7, 2022

erikyao commented Apr 8, 2022 •

edited

Loading

erikyao commented Apr 8, 2022 •

edited

Loading

colleenXu commented Apr 10, 2022 •

edited

Loading

colleenXu commented Apr 10, 2022 •

edited

Loading

erikyao commented Apr 11, 2022

erikyao commented Apr 11, 2022

erikyao commented Apr 12, 2022

colleenXu commented Apr 13, 2022

colleenXu commented Apr 13, 2022 •

edited

Loading

colleenXu commented Apr 13, 2022

erikyao commented Apr 13, 2022

the column-to-field table

erikyao commented Apr 13, 2022

colleenXu commented Apr 15, 2022 •

edited

Loading

update pending dgidb API #40

update pending dgidb API #40

Comments

colleenXu commented Dec 16, 2021

andrewsu commented Dec 16, 2021

colleenXu commented Dec 28, 2021 • edited Loading

colleenXu commented Jan 10, 2022 • edited Loading

erikyao commented Apr 7, 2022

colleenXu commented Apr 7, 2022

erikyao commented Apr 8, 2022 • edited Loading

erikyao commented Apr 8, 2022 • edited Loading

colleenXu commented Apr 10, 2022 • edited Loading

On the wikidata vs chembl.compound IDs

the column-to-field table

colleenXu commented Apr 10, 2022 • edited Loading

erikyao commented Apr 11, 2022

erikyao commented Apr 11, 2022

erikyao commented Apr 12, 2022

colleenXu commented Apr 13, 2022

colleenXu commented Apr 13, 2022 • edited Loading

colleenXu commented Apr 13, 2022

erikyao commented Apr 13, 2022

the column-to-field table

erikyao commented Apr 13, 2022

colleenXu commented Apr 15, 2022 • edited Loading

colleenXu commented Dec 28, 2021 •

edited

Loading

colleenXu commented Jan 10, 2022 •

edited

Loading

erikyao commented Apr 8, 2022 •

edited

Loading

erikyao commented Apr 8, 2022 •

edited

Loading

colleenXu commented Apr 10, 2022 •

edited

Loading

colleenXu commented Apr 10, 2022 •

edited

Loading

colleenXu commented Apr 13, 2022 •

edited

Loading

colleenXu commented Apr 15, 2022 •

edited

Loading