Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find/add non-Translator APIs to BTE #372

Open
andrewsu opened this issue Dec 15, 2021 · 18 comments
Open

find/add non-Translator APIs to BTE #372

andrewsu opened this issue Dec 15, 2021 · 18 comments

Comments

@andrewsu
Copy link
Member

In #370, we analyzed the list of 46 APIs currently listed in the config.js file, summarized below. One of the things that jumped out at me is that all but five (bolded) are translator-associated APIs. One of the advantages of the BTE/SmartAPI approach is that, in theory, we can easily annotate APIs "in the wild" that we don't maintain. But to really demonstrate that, we need to increase the number of those "in the wild" APIs. This ticket is to track our identification/incorporation of such APIs, either by reviewing the existing SmartAPI registry for appropriate APIs that aren't yet in our config.js file, or by finding and registering new APIs.

Operation count API name server URL
1414 Automat Uberongraph (trapi v-1.2.0) https://automat.renci.org/uberongraph/1.2
765 BioThings SEMMEDDB API https://biothings.ncats.io/semmeddb
711 Automat CTD (trapi v-1.2.0) https://automat.renci.org/ctd/1.2
338 ICEES Asthma Instance API https://icees.renci.org:16339
291 Automat Cord19 (trapi v-1.2.0) https://automat.renci.org/cord19/1.2
210 Automat Biolink (trapi v-1.2.0) https://automat.renci.org/biolink/1.2
162 ICEES DILI Instance API https://icees.renci.org:16341
157 Automat Hetio (trapi v-1.2.0) https://automat.renci.org/hetio/1.2
134 Automat Ontological Hierarchy (trapi v-1.2.0) https://automat.renci.org/ontological-hierarchy/1.2
104 Automat DrugCentral (trapi v-1.2.0) https://automat.renci.org/drugcentral/1.2
95 Automat Pharos (trapi v-1.2.0) https://automat.renci.org/pharos/1.2
72 Automat HMDB (trapi v-1.2.0) https://automat.renci.org/hmdb/1.2
60 Automat Human GOA (trapi v-1.2.0) https://automat.renci.org/human-goa/1.2
50 COHD TRAPI 1.2 https://cohd.io/api/
49 Automat Textmining KP (trapi v-1.2.0) https://automat.renci.org/textminingkp/1.2
40 Automat Gtopdb (trapi v-1.2.0) https://automat.renci.org/gtopdb/1.2
35 Automat Viral Proteome (trapi v-1.2.0) https://automat.renci.org/viral-proteome/1.2
34 Connections Hypothesis Provider API http://chp.thayer.dartmouth.edu
24 Automat Panther (trapi v-1.2.0) https://automat.renci.org/panther/1.2
24 Clinical Risk KP API https://biothings.ncats.io/clinical_risk_kp
20 Automat GTEx (trapi v-1.2.0) https://automat.renci.org/gtex/1.2
18 Automat GWAS Catalog (trapi v-1.2.0) https://automat.renci.org/gwas-catalog/1.2
18 Multiomics Wellness KP API https://biothings.ncats.io/multiomics_wellness_kp
17 MyDisease.info API http://mydisease.info/v1
14 BioLink API https://api.monarchinitiative.org/api
12 Automat Foodb (trapi v-1.2.0) https://automat.renci.org/foodb/1.2
12 MyChem.info API https://mychem.info/v1
11 Automat IntAct (trapi v-1.2.0) https://automat.renci.org/intact/1.2
11 MyGene.info API https://mygene.info/v3
10 Gene Ontology Biological Process API https://biothings.ncats.io/go_bp
10 MyVariant.info API https://myvariant.info/v1
9 UBERON Ontology API https://biothings.ncats.io/uberon
7 BioThings iDISK API https://biothings.ncats.io/idisk
4 Automat HGNC (trapi v-1.2.0) https://automat.renci.org/hgnc/1.2
4 Gene Ontology Cellular Component API https://biothings.ncats.io/go_cc
4 Gene Ontology Molecular Activity API https://biothings.ncats.io/go_mf
4 MGIgene2phenotype API https://biothings.ncats.io/mgigene2phenotype
2 BioThings DGIdb API https://biothings.ncats.io/dgidb
2 DISEASES API https://biothings.ncats.io/DISEASES
2 EBI Proteins API https://www.ebi.ac.uk/proteins/api
2 EBIgene2phenotype API https://biothings.ncats.io/ebigene2phenotype
2 Human Phenotype Ontology API https://biothings.ncats.io/hpo
1 LINCS Data Portal API http://lincsportal.ccs.miami.edu/dcic/api/
1 LitVar API https://www.ncbi.nlm.nih.gov/research/bionlp/litvar/api/v1
1 Ontology Lookup Service API https://www.ebi.ac.uk/ols/api
1 QuickGO API https://www.ebi.ac.uk/QuickGO/services
@andrewsu
Copy link
Member Author

In parsing the smartapi registry for entries with x-bte-kgs-operations but not in our config.js file, I get the following list:

$ python3 get_xbte_from_smartapi.py
SEMMED Anatomy API      https://biothings.ncats.io/semmed_anatomy
SEMMED Phenotype API    https://biothings.ncats.io/semmedphenotype
ROBOKOP https://robokop.renci.org/api
Big GIM 1 API   https://biothings.ncats.io/biggim
TCGA Mutation Frequency KP API  https://biothings.ncats.io/tcga_mut_freq_kp
OpenTarget API  https://platform-api.opentargets.io/v3
SEMMED Biological Process API   https://biothings.ncats.io/semmedbp
Text Mining CO-OCCURRENCE API   https://biothings.ncats.io/text_mining_co_occurrence_kp
ChEMBL API      http://www.ebi.ac.uk/chembl/api
SEMMED Chemical API     https://biothings.ncats.io/semmedchemical
SEMMED Gene API https://biothings.ncats.io/semmedgene
Text Mining Targeted Association API    https://biothings.ncats.io/text_mining_targeted_association
Multiomics BigGIM-DrugResponse KP API   https://biothings.ncats.io/drug_response_kp
RGD API https://rest.rgd.mcw.edu/rgdws
SEMMED Disease API      https://biothings.ncats.io/semmed
pfocr API       https://pending.biothings.io/pfocr

OpenTarget, ChEMBL, and RGD are worth looking into more on whether they should be included by default in the config.js file.

Code:

import json
import re

allowed_api_ids = []
config_lines = open("../src/routes/v1/config.js").read().splitlines()
for line in config_lines:
    if re.search('id: ', line) and not re.search('//',line):
        line = re.sub("',$","", re.sub(".*id: '","",line))
        allowed_api_ids.append(line)
allowed_api_ids = set(allowed_api_ids)


d = json.load(open('../data/smartapi_specs.json'))
for api in d['hits']:
    if "components" in api.keys() and "x-bte-kgs-operations" in api['components'].keys() and not api['_id'] in allowed_api_ids:
        api_name = api['info']['title']
        server = api['servers'][0]['url']
        print(api_name+"\t"+server)

@colleenXu
Copy link
Collaborator

Notes:

  • I suggest reviewing the x-bte annotation for the external APIs that BTE currently uses. This likely needs some editing / adding operations
  • Reviewing APIs that have smartapi annotations / x-bte annotations (Andrew has done a bit of that above) and deciding whether we want to update them and ingest them into BTE

I think we'd want to prioritize resources that have non-overlapping data w/ the BioThings APIs...

In the config file, These apis are listed as being external

  • Biolink API (Monarch)
  • EBI proteins API
  • LINCS API
  • LitVar
  • QuickGO
  • Ontology Lookup Service (EBI)

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 18, 2022

I think these are all resources that are already in Translator. But I think only people added to it can see this...


I went through some of my bookmarks (my main folder for possible resources)....and I organized it into this. The questions are the stuff I haven't done systematically yet >.<

Questions:

  • does this resource have structured info on relationships between biomedical stuff?
  • are we allowed to use it?
  • does this resource already have an api that is usable for x-bte annotation, or do we need to stand up a biothings api?

Other notes:

  • old notes on resource topics that might be interesting
  • I think this resource is too old, but it might have an interesting scoring approach + its citations might be possible resources?
  • issues with adding incidence / prevalence of disease into anything...

List of resources to review:

Less promising?

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 19, 2022

other resources related to research issues (besides grantome.com):

@colleenXu
Copy link
Collaborator

Resources discussed during Feb relay this week:

@colleenXu
Copy link
Collaborator

colleenXu commented Feb 22, 2022

biothings/pending.api#57 on evaluating an existing api, or making a new pending

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 23, 2022

Discussion of CTD's API: https://suwulab.slack.com/archives/CC218TEKC/p1648008085792669?thread_ts=1647966388.829489&cid=CC218TEKC. Also updated the posts above for the issues in the pending repo.

@colleenXu
Copy link
Collaborator

colleenXu commented Apr 13, 2022

https://www.expasy.org/
https://glygen.org/home/

there may be tools/APIs here?

@colleenXu
Copy link
Collaborator

https://rampdb.nih.gov/about has an API with an OpenAPI v3 spec

@colleenXu
Copy link
Collaborator

note that CIVIC is using a GraphQL API (can we handle this with OpenAPI v3 / x-bte annotation?)

Andrew,
As a valued data client I am reaching out to you to let you know that the recently launched CIViC 2.0 features a completely redesigned user interface and a more powerful GraphQL API. These changes do not impact our current data releases.
If you are using our API:
These updates impact our API endpoints, please review the updated GraphQL API documentation available on GitHub. The GraphiQL user interface found here is a good way to get started interacting with the API.
The original API will be retired in the coming months and will not contain the most up-to-date CIViC data.
If you have any questions, comments, or would like help adapting an existing integration to our new API, please email us at help@civicdb.org.
Thank you for your interest in CIViC,
The CIViC team

@gtsueng
Copy link

gtsueng commented Jul 19, 2022

RGD evaluation

Does this resource have structured info on relationships between biomedical stuff?
-RGD has genes, diseases, phenotypes associations that include:

  • Gene Ontology levels of evidence annotations manually applied to Gene/disease, QTL/disease, gene/phenotype, QTL/phenotype relationships and others
  • Measurement Methods, Experimental Conditions, Clinical Measurement all normalized to specific ontologies

Potential value add:

  • Ability to filter gene disease associations by level of evidence, measurement methods, experimental conditions, etc. These annotations are normalized to external ontologies
  • Since QTLs may encompass multiple genes, QTL-disease/phenotype associations may provide potential gene-disease/phenotype associations where such associations otherwise do not exist

Potential issues:

  • Many disease/phenotypes associations are mapped to Quantitative Trait Loci (QTLs) rather than directly to genes. The QTLs were imported to NCBI Gene at some point (and have NCBI Gene ids), but these ids seem to have been discontinued
  • QTLs encompass multiple genes, so it is possible to pollute your knowledge graph with potential gene-disease/phenotype associations that are incorrect
  • RGD uses RGD IDs which are integers. No way to differentiate between QTL RGD IDs and Gene RGD IDs--hit a gene API end point with a QTL id and get either a 500 error or an empty response

Are we allowed to use it?

  • RGD is licensed CC By 4.0

Does this resource already have an api that is usable for x-bte annotation, or do we need to stand up a biothings api?
-Already has SMART API registry record
-API has A LOT of endpoints, but uses its own internal ids. Some key examples include:

  • /genes/{rgdId} : RGD gene id input, output: name, symbol, other/external ids, species.
  • /annotations/rgdID/{rgdID} : RGD id input (QTL ids), annotations (including disease/phenotype, measurement type, disease, phenotypes, evidence, etc.)
  • /lookup/ : map RGD gene ids to other external identifiers

-MyGene.info had RGD at some point (not sure if it's still there) and NCBI gene had ids for RGD QTLs until about June of 2021 (many if not all NCBI gene ids for RGD QTLs were retired between January and June of 2021).

-Not seeing any endpoint for mapping QTLs to genes (probably have to go about it using the endpoints for gene or qtl chromosome start and end points

@gtsueng
Copy link

gtsueng commented Jul 19, 2022

IUPHAR/BPS Guide to Pharmacology evaluation

Does this resource have structured info on relationships between biomedical stuff?

  • Disease, Target, Ligand associations: Disease is associated with these targets, and these ligands
  • Ligand, drug synonyms, clinical trial associations: Ligand has these synonyms, and is linked to these clinical trials
  • Ligand-Target relationships: inhibitor/agonist/etc. as well as binding affinities
  • Ligand classifications, Target classifications

Potential value add:

  • Disease, Target, Ligand associations and relationships

Potential issues:

  • May overlap with CTD

Are we allowed to use it?

Does this resource already have an api that is usable for x-bte annotation, or do we need to stand up a biothings api?

  • A SMART API registry record has been created for a graph based on this
  • REST API details at: https://www.guidetopharmacology.org/webServices.jsp
  • GtoPdb target Ids can be mapped to gene symbols and uniprot ids
  • GtoPdb ligand Ids can be mapped to SMILES, InChIKey
  • Ligands have been reported to conform to Bioschemas:MolecularEntity

@colleenXu
Copy link
Collaborator

Note: DGIdb has an API but it takes names/symbols as input (not IDs…), and it doesn't have an OpenAPI spec. The "names, not ID" issue means we can't integrate that API into Translator / BTE well....we have to download the data and make a BioThings API instead...

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 29, 2023

Collections of a lot of resources, most of which should have their own APIs:

@colleenXu
Copy link
Collaborator

colleenXu commented Aug 2, 2023

Pasting an old note of mine:

If we wanted to look for data sources to cover CategoryA -> CategoryB:

@colleenXu
Copy link
Collaborator

Some data resource lists that could be helpful:

@colleenXu
Copy link
Collaborator

food resources https://foodmetabolome.org/databases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants