BioPortal-to-KGX

Assemble a BioPortal Knowledge Graph through the following steps:

Transform the BioPortal 4store data dump to KGX graphs, with ROBOT preprocessing
Validate the output graphs with KGX to determine alignment to the Biolink Model
Obtain additional ontology metadata through the Bioportal API
Retrieve mappings for nodes without clear Bioportal analogues through Bioportal

Usage

Prepare a dump of the Bioportal 4store data with the 4s-dump script.

The dump will be in the form of n-triples, with individual sets of records in nested directories and one line of metadata at the top of each file.

Run BioPortal-to-KGX with all validation and metadata retrieval options as:

python run.py --input ../path/to/your/data/ --kgx_validate --robot_validate --pandas_validate --write_curies --get_bioportal_metadata --ncbo_key YOUR_NCBO_API_KEY_HERE

Specify individual ontologies to include or exclude with the --include_only and --exclude options, respectively, each followed by a comma-delimited list of the original hashed file ID from the 4store dump.

For example:

python run.py --input ../path/to/your/data/ --include_only dabd4d902360003975fb25ae56f8,7b95f2cc27c8fb0d5df11fbdb078

Output will be written to the /bioportal_to_kgx directory within /transformed, with subdirectories named for the 4store graph and each subgraph.

Each subgraph will contain:

node and edge files ({subgraph_name}_nodes.tsv and {subgraph_name}_edges.tsv, respectively)
A JSON version of the ontology ({subgraph_name}_relaxed.json)
logs containing any validation messages about the transforms

Troubleshooting

The --robot_validate option may fail on larger ontologies like NCBITAXON with java.lang.OutOfMemoryError. Consider omitting this option or running ROBOT on files directly, as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 489 Commits
.github/workflows		.github/workflows
bioportal_to_kgx		bioportal_to_kgx
post_setup		post_setup
prefixes		prefixes
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_all_counts.sh		get_all_counts.sh
get_all_transform_stats.sh		get_all_transform_stats.sh
get_all_types.sh		get_all_types.sh
get_biolink_stats.sh		get_biolink_stats.sh
get_iris.sh		get_iris.sh
get_umls_stats.sh		get_umls_stats.sh
namespace_maps.tsv		namespace_maps.tsv
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioPortal-to-KGX

Usage

Troubleshooting

About

Releases

Packages

Languages

License

ncbo/BioPortal-to-KGX

Folders and files

Latest commit

History

Repository files navigation

BioPortal-to-KGX

Usage

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages