Add subsetting functionality #67

cthoyt · 2023-09-08T12:55:25Z

This functionality is useful for downstream applications like the following:

You load a comprehensive extended prefix map, e.g., from the Bioregistry using curies.get_bioregistry_converter().
You load some data that conforms to this prefix map by convention. This is often the case for semantic mappings stored in the SSSOM format
You extract the list of prefixes actually used within your data
You subset the detailed extended prefix map to only include prefixes relevant for your data
You make some kind of output of the subsetted extended prefix map to go with your data. Effectively, this is a way of reconciling data. This is especially effective when using the Bioregistry or other comprehensive extended prefix maps.

Here's a concrete example of doing this (which also includes a bit of data science)
to do this on the SSSOM mappings from the Disease Ontology project.

>>> import curies
>>> import pandas as pd
>>> import itertools as itt
>>> commit = "faca4fc335f9a61902b9c47a1facd52a0d3d2f8b"
>>> url = f"https://github.com/mapping-commons/disease-mappings/blob/{commit}/mappings/doid.sssom.tsv"
>>> df = pd.read_csv(url, sep="\t", comment="#")
>>> prefixes = {
...     curies.Reference.from_curie(curie).prefix
...     for column in ["subject_id", "predicate_id", "object_id"]
...     for curie in df[column]
... }
>>> converter = curies.get_bioregistry_converter()
>>> slim_converter = converter.get_subconverter(prefixes)

This PR also sneaks in a related documentation update to pandas dataframe processing

cthoyt added 5 commits September 8, 2023 14:35

Add subsetting functionality

75878e4

Rename

e505e70

Update api.py

b3d86a8

Update docs

7ed2324

Update api.py

d4df6a5

cthoyt enabled auto-merge (squash) September 8, 2023 13:12

Fix url

e69996a

cthoyt merged commit e5c56d6 into main Sep 8, 2023
8 checks passed

cthoyt deleted the subsets branch September 8, 2023 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add subsetting functionality #67

Add subsetting functionality #67

cthoyt commented Sep 8, 2023 •

edited

Loading

Add subsetting functionality #67

Add subsetting functionality #67

Conversation

cthoyt commented Sep 8, 2023 • edited Loading

cthoyt commented Sep 8, 2023 •

edited

Loading