Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional values field to response filtering terms #160

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gsfk
Copy link
Collaborator

@gsfk gsfk commented Aug 20, 2024

PR for issue #79, where I suggested adding a field to filtering terms, so here is the change. Targeting the main branch since dev is trailing main.

The proposal is to bring FilteringTerm in the request into better alignment with FilteringTerm in the response from various /filtering_terms endpoints. I ignore a third, similar definition of filters available at /framework/src/configuration/filteringTermSchema, this file is orphaned and is falling behind the rest of the spec.

This PR proposes an optional addition to the response; this is a non-breaking change (see this example) which among other selling points reduces the verbosity of /filtering_terms responses. See these slides for further discussion.

Motivation / use cases

An alphanumeric query has required fields id, operator, value; an example query (from the docs) is

{
    "id": "geographicOrigin",
    "operator": "=",
    "value": "England"
}

But the response filtering term discards both the operator and value fields. The expected workaround among some beacon users appears to be to package the id, operator and value into a single string so it can be returned as an id, eg:

id: geographicOrigin=England

or something similar. But this is unsupported by the specification and presents other limitations, such as verbose unstructured responses from /filtering_terms endpoints, see slides linked above for discussion. The proposed solution is to not discard the "value" field.

The proposed change was motivated by the addition of beacon v2 to the Bento platform, where we encountered a few issues with alphanumeric search:

(1) String fields not covered by a specification: In order to handle data from the Quebec covid bioank (Biobanque québécoise de la COVID-19), we added an extra property field for subject domicile, with values matching the clinical data collected:

[
    "At home",
    "In residence for the elderly (RPA)",
    "Nursing home (CHSLD)",
    "Homeless",

    ... etc...
]

Some fields (such as "homeless") map well to existing ontology terms, but others involve Quebec-specific terminology such as "CHSLD"; domicile data is stored as a string from an enum of acceptable values. We anticipate similar issues handling data from the upcoming Pan-Canadian Genome Library.

(2) Binned Search: Bento beacon is focused on public anonymous search, so some fields are searchable by predefined bins only, to inhibit re-identification attacks. Here are some example bins for BMI, again from the covid biobank:

[
    "< 20",
    "[20, 25)",
    "[25, 27)",
    "[27, 30)",
    "[30, 35)",
    "[35, 40)",
    "≥ 40"
]

but these bins are not discoverable without workarounds. Some fields may have a large number of bins, which makes the expected workaround above needlessly verbose.

Discussion

The goal here is to improve alignment between request and response filtering terms, since only the latter is discoverable.

The problem is not simply that value and operator go missing when we make a /filtering_term response, but that there is also an ambiguity in how different queries treat the id field. For OntologyFilter, id represents a value to be found in the data, but for AlphanumericFilter, id is a field to be queried. The design of response filtering terms seems to have considered only the first kind of id. Consequently, only OntologyFilter is truly discoverable, other kinds of query require workarounds outside of the specification.

What if I don't like this PR?

The discoverability problem for alphanumeric filters still exists! There are alternative solutions:

  • codify the expected workaround (squashing everything into a single string)
  • adopt the "target" proposal mentioned in issue 115: this goes some way to preserving the connection between the values being searched for and the fields where you can expect to find them.

Both of these alternatives leave us with a flattened, verbose /filtering_terms response which this PR eliminates, but these are better than changing nothing. Or, as I've heard suggested, do a complete redesign of filters. But that looks like Beacon v3, which is too far in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant