Skip to content

Commit

Permalink
Reorg data sciecnce code
Browse files Browse the repository at this point in the history
  • Loading branch information
cthoyt committed Sep 21, 2023
1 parent 162eb29 commit efb9f47
Show file tree
Hide file tree
Showing 5 changed files with 241 additions and 194 deletions.
88 changes: 44 additions & 44 deletions notebooks/Data Science Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 30 ms, sys: 4.59 ms, total: 34.6 ms\n",
"Wall time: 646 ms\n"
"CPU times: user 185 ms, sys: 108 ms, total: 293 ms\n",
"Wall time: 917 ms\n"
]
}
],
Expand All @@ -43,8 +43,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 7.08 s, sys: 69.5 ms, total: 7.15 s\n",
"Wall time: 7.22 s\n"
"CPU times: user 6.73 s, sys: 63 ms, total: 6.79 s\n",
"Wall time: 6.8 s\n"
]
}
],
Expand Down Expand Up @@ -104,46 +104,46 @@
"\n",
"Standardization was not necessary for 2 (0.0%), resulted in 0 updates (0.0%), and 34,522 failures (100.0%) in column `object_id`. Here's a breakdown of the prefixes that weren't possible to standardize:\n",
"\n",
"| prefix | count | examples |\n",
"|:-----------------------|--------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
"| EFO | 131 | EFO:0000195, EFO:0000612, EFO:0000729, EFO:0003914, EFO:0004222 |\n",
"| GARD | 2030 | GARD:1224, GARD:4771, GARD:6464, GARD:7179, GARD:7475 |\n",
"| ICD10CM | 3666 | ICD10CM:E75.0, ICD10CM:H35.42, ICD10CM:I75, ICD10CM:K59.39, ICD10CM:Q38.1 |\n",
"| ICD9CM | 2266 | ICD9CM:335.22, ICD9CM:368.51, ICD9CM:375.15, ICD9CM:618.8, ICD9CM:622.2 |\n",
"| ICDO | 361 | ICDO:8050/3, ICDO:8051/3, ICDO:8290/0, ICDO:8470/3, ICDO:8920/3 |\n",
"| KEGG | 41 | KEGG:05210, KEGG:05219, KEGG:05221, KEGG:05310, KEGG:H02296 |\n",
"| MEDDRA | 41 | MEDDRA:10001229, MEDDRA:10036794, MEDDRA:10066387, MEDDRA:10068842 |\n",
"| MESH | 3847 | MESH:C562745, MESH:D003882, MESH:D008288, MESH:D009072, MESH:D015270 |\n",
"| NCI | 4788 | NCI:C27472, NCI:C3406, NCI:C39860, NCI:C4296, NCI:C84886 |\n",
"| OMIM | 5539 | OMIM:154800, OMIM:229050, OMIM:255300, OMIM:614465, OMIM:615725 |\n",
"| ORDO | 2023 | ORDO:2554, ORDO:295195, ORDO:397593, ORDO:733, ORDO:79257 |\n",
"| SNOMEDCT_US_2020_03_01 | 6 | SNOMEDCT_US_2020_03_01:236818008, SNOMEDCT_US_2020_03_01:254828009, SNOMEDCT_US_2020_03_01:52564001 |\n",
"| SNOMEDCT_US_2020_09_01 | 1 | SNOMEDCT_US_2020_09_01:1112003 |\n",
"| SNOMEDCT_US_2021_07_31 | 10 | SNOMEDCT_US_2021_07_31:205329008, SNOMEDCT_US_2021_07_31:268180007, SNOMEDCT_US_2021_07_31:75931002, SNOMEDCT_US_2021_07_31:785879009, SNOMEDCT_US_2021_07_31:86249007 |\n",
"| SNOMEDCT_US_2021_09_01 | 5088 | SNOMEDCT_US_2021_09_01:128925001, SNOMEDCT_US_2021_09_01:254916002, SNOMEDCT_US_2021_09_01:267572005, SNOMEDCT_US_2021_09_01:389261002, SNOMEDCT_US_2021_09_01:94069006 |\n",
"| UMLS_CUI | 6890 | UMLS_CUI:C0085574, UMLS_CUI:C0153212, UMLS_CUI:C0282492, UMLS_CUI:C1332356, UMLS_CUI:C1838329 |\n",
"| prefix | count | examples |\n",
"|:-----------------------|--------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
"| EFO | 131 | EFO:0000274, EFO:0001071, EFO:0001075, EFO:0001422, EFO:0004705 |\n",
"| GARD | 2030 | GARD:2562, GARD:5721, GARD:6291, GARD:7065, GARD:8378 |\n",
"| ICD10CM | 3666 | ICD10CM:A21.0, ICD10CM:C03, ICD10CM:K72, ICD10CM:K82.4, ICD10CM:N30.0 |\n",
"| ICD9CM | 2266 | ICD9CM:214.4, ICD9CM:232.4, ICD9CM:377.75, ICD9CM:428.2, ICD9CM:745.6 |\n",
"| ICDO | 361 | ICDO:8300/0, ICDO:8840/3, ICDO:9442/1, ICDO:9530/0, ICDO:9590/3 |\n",
"| KEGG | 41 | KEGG:05016, KEGG:05133, KEGG:05142, KEGG:05222, KEGG:05414 |\n",
"| MEDDRA | 41 | MEDDRA:10001229, MEDDRA:10015487, MEDDRA:10021312, MEDDRA:10059200, MEDDRA:10060740 |\n",
"| MESH | 3847 | MESH:D002128, MESH:D005141, MESH:D009198, MESH:D011040, MESH:D017240 |\n",
"| NCI | 4788 | NCI:C26913, NCI:C27390, NCI:C27871, NCI:C40284, NCI:C6081 |\n",
"| OMIM | 5539 | OMIM:209700, OMIM:222300, OMIM:530000, OMIM:613021, OMIM:618224 |\n",
"| ORDO | 2023 | ORDO:139441, ORDO:2510, ORDO:255229, ORDO:420702, ORDO:48652 |\n",
"| SNOMEDCT_US_2020_03_01 | 6 | SNOMEDCT_US_2020_03_01:236818008, SNOMEDCT_US_2020_03_01:778024005, SNOMEDCT_US_2020_03_01:8757006 |\n",
"| SNOMEDCT_US_2020_09_01 | 1 | SNOMEDCT_US_2020_09_01:1112003 |\n",
"| SNOMEDCT_US_2021_07_31 | 10 | SNOMEDCT_US_2021_07_31:268180007, SNOMEDCT_US_2021_07_31:703536004, SNOMEDCT_US_2021_07_31:721311006, SNOMEDCT_US_2021_07_31:75931002 |\n",
"| SNOMEDCT_US_2021_09_01 | 5088 | SNOMEDCT_US_2021_09_01:111359004, SNOMEDCT_US_2021_09_01:155748004, SNOMEDCT_US_2021_09_01:238113006, SNOMEDCT_US_2021_09_01:38804009, SNOMEDCT_US_2021_09_01:92585006 |\n",
"| UMLS_CUI | 6890 | UMLS_CUI:C0031347, UMLS_CUI:C0206724, UMLS_CUI:C0276007, UMLS_CUI:C0392492, UMLS_CUI:C1515285 |\n",
"\n",
"## Suggestions\n",
"\n",
"- NCI appears in Bioregistry under [`ncit`](https://bioregistry.io/ncit). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- MESH appears in Bioregistry under [`mesh`](https://bioregistry.io/mesh). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- ICD9CM appears in Bioregistry under [`icd9cm`](https://bioregistry.io/icd9cm). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- SNOMEDCT_US_2021_09_01 appears in Bioregistry under [`snomedct`](https://bioregistry.io/snomedct). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- UMLS_CUI appears in Bioregistry under [`umls`](https://bioregistry.io/umls). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- ICD10CM appears in Bioregistry under [`icd10cm`](https://bioregistry.io/icd10cm). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- ORDO appears in Bioregistry under [`orphanet.ordo`](https://bioregistry.io/orphanet.ordo). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- GARD appears in Bioregistry under [`gard`](https://bioregistry.io/gard). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- OMIM appears in Bioregistry under [`omim`](https://bioregistry.io/omim). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- ICDO appears in Bioregistry under [`icdo`](https://bioregistry.io/icdo). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- EFO appears in Bioregistry under [`efo`](https://bioregistry.io/efo). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- MEDDRA appears in Bioregistry under [`meddra`](https://bioregistry.io/meddra). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- KEGG appears in Bioregistry under [`kegg`](https://bioregistry.io/kegg). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- SNOMEDCT_US_2021_07_31 appears in Bioregistry under [`snomedct`](https://bioregistry.io/snomedct). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- SNOMEDCT_US_2020_03_01 appears in Bioregistry under [`snomedct`](https://bioregistry.io/snomedct). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n",
"- SNOMEDCT_US_2020_09_01 appears in Bioregistry under [`snomedct`](https://bioregistry.io/snomedct). Consider chaining your converter with the Bioregistry using [`curies.chain()`](https://curies.readthedocs.io/en/latest/api/curies.chain.html).\n"
"- NCI Suggestion.x7 - ncit\n",
"- MESH Suggestion.x7 - mesh\n",
"- ICD9CM Suggestion.x7 - icd9cm\n",
"- SNOMEDCT_US_2021_09_01 Suggestion.x7 - snomedct\n",
"- UMLS_CUI Suggestion.x7 - umls\n",
"- ICD10CM Suggestion.x7 - icd10cm\n",
"- ORDO Suggestion.x7 - orphanet.ordo\n",
"- GARD Suggestion.x7 - gard\n",
"- OMIM Suggestion.x7 - omim\n",
"- ICDO Suggestion.x7 - icdo\n",
"- EFO Suggestion.x7 - efo\n",
"- MEDDRA Suggestion.x7 - meddra\n",
"- KEGG Suggestion.x7 - kegg\n",
"- SNOMEDCT_US_2021_07_31 Suggestion.x7 - snomedct\n",
"- SNOMEDCT_US_2020_03_01 Suggestion.x7 - snomedct\n",
"- SNOMEDCT_US_2020_09_01 Suggestion.x7 - snomedct\n"
],
"text/plain": [
"Report(converter=<curies.api.Converter object at 0x16b80f890>, column='object_id', nones=0, stayed=2, updated=0)"
"Report(converter=<curies.api.Converter object at 0x136ff2f10>, column='object_id', nones=0, stayed=2, updated=0)"
]
},
"execution_count": 5,
Expand All @@ -164,7 +164,7 @@
{
"data": {
"text/plain": [
"<curies.api.Converter at 0x16b7d8a90>"
"<curies.api.Converter at 0x1475df390>"
]
},
"execution_count": 6,
Expand All @@ -188,7 +188,7 @@
"Standardization was successfully applied to all 36,730 CURIEs in column `object_id`."
],
"text/plain": [
"Report(converter=<curies.api.Converter object at 0x16b7d8a90>, column='object_id', nones=0, stayed=0, updated=36730)"
"Report(converter=<curies.api.Converter object at 0x1475df390>, column='object_id', nones=0, stayed=0, updated=36730)"
]
},
"execution_count": 7,
Expand Down Expand Up @@ -228,11 +228,11 @@
"\n",
"## Suggestions\n",
"\n",
"- http entries are not CURIEs, try and compressing your data first.\n",
"- not_a_curie is not a valid CURIE\n"
"- http Suggestion.x2\n",
"- not_a_curie Suggestion.x3\n"
],
"text/plain": [
"Report(converter=<curies.api.Converter object at 0x16b7d8a90>, column=0, nones=1, stayed=1, updated=1)"
"Report(converter=<curies.api.Converter object at 0x1475df390>, column=0, nones=1, stayed=1, updated=1)"
]
},
"execution_count": 8,
Expand Down
2 changes: 2 additions & 0 deletions src/curies/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
load_prefix_map,
)
from .reconciliation import remap_curie_prefixes, remap_uri_prefixes, rewire
from .report import Report
from .sources import (
get_bioregistry_converter,
get_go_converter,
Expand All @@ -28,6 +29,7 @@
__all__ = [
"Converter",
"Record",
"Report",
"ReferenceTuple",
"Reference",
"DuplicateValueError",
Expand Down
Loading

0 comments on commit efb9f47

Please sign in to comment.