add optional values field to response filtering terms #160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR for issue #79, where I suggested adding a field to filtering terms, so here is the change. Targeting the main branch since dev is trailing main.
The proposal is to bring FilteringTerm in the request into better alignment with FilteringTerm in the response from various
/filtering_terms
endpoints. I ignore a third, similar definition of filters available at /framework/src/configuration/filteringTermSchema, this file is orphaned and is falling behind the rest of the spec.This PR proposes an optional addition to the response; this is a non-breaking change (see this example) which among other selling points reduces the verbosity of /filtering_terms responses. See these slides for further discussion.
Motivation / use cases
An alphanumeric query has required fields
id
,operator
,value
; an example query (from the docs) isBut the response filtering term discards both the
operator
andvalue
fields. The expected workaround among some beacon users appears to be to package the id, operator and value into a single string so it can be returned as an id, eg:or something similar. But this is unsupported by the specification and presents other limitations, such as verbose unstructured responses from /filtering_terms endpoints, see slides linked above for discussion. The proposed solution is to not discard the "value" field.
The proposed change was motivated by the addition of beacon v2 to the Bento platform, where we encountered a few issues with alphanumeric search:
(1) String fields not covered by a specification: In order to handle data from the Quebec covid bioank (Biobanque québécoise de la COVID-19), we added an extra property field for subject domicile, with values matching the clinical data collected:
Some fields (such as "homeless") map well to existing ontology terms, but others involve Quebec-specific terminology such as "CHSLD"; domicile data is stored as a string from an enum of acceptable values. We anticipate similar issues handling data from the upcoming Pan-Canadian Genome Library.
(2) Binned Search: Bento beacon is focused on public anonymous search, so some fields are searchable by predefined bins only, to inhibit re-identification attacks. Here are some example bins for BMI, again from the covid biobank:
but these bins are not discoverable without workarounds. Some fields may have a large number of bins, which makes the expected workaround above needlessly verbose.
Discussion
The goal here is to improve alignment between request and response filtering terms, since only the latter is discoverable.
The problem is not simply that
value
andoperator
go missing when we make a /filtering_term response, but that there is also an ambiguity in how different queries treat theid
field. For OntologyFilter, id represents a value to be found in the data, but for AlphanumericFilter, id is a field to be queried. The design of response filtering terms seems to have considered only the first kind ofid
. Consequently, only OntologyFilter is truly discoverable, other kinds of query require workarounds outside of the specification.What if I don't like this PR?
The discoverability problem for alphanumeric filters still exists! There are alternative solutions:
Both of these alternatives leave us with a flattened, verbose /filtering_terms response which this PR eliminates, but these are better than changing nothing. Or, as I've heard suggested, do a complete redesign of filters. But that looks like Beacon v3, which is too far in the future.