Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering terms in model: Confusing /__entryType__/filteringTerms.json files #93

Closed
mbaudis opened this issue Jun 19, 2023 · 8 comments
Closed
Assignees

Comments

@mbaudis
Copy link
Member

mbaudis commented Jun 19, 2023

FilteringTerms and filters are still confusing. One of the areas is the existence of filteringTerms.json files (e.g. in biosamples) which are supposedly placeholders for information files about the available filtering terms for the entry type but do not constitute an endpoint. So it seems that they are for internal use only (and anyway most probably would be kept in a database or generated on the fly).

Proposals:

  1. delete filteringTerms.yaml / .json
  2. optionally add a /filtering_terms endpoint to endpoints, for each entity where filters apply, e.g. /biosamples/filtering_terms/, additionally to /filtering_terms, to list all applying to the given scope
@redmitry
Copy link
Collaborator

I would rather add

"scope": {
    "type": "array",
    "items": {
        "type": "string"
    }
}

property to the beaconFilteringTermsResults.json.

so we would define entryTypes for filters and do not need many endpoints.
the /filtering_terms endpoint will have:

{
    "id": "LOINC:3141-9",
    "label": "Weight",
    "type": "alphanumeric",
    "scope": ["individual", "biosamples", ...]
}

I think this would simplify both, implementation and usage.

Best,

Dmitry

@mbaudis
Copy link
Member Author

mbaudis commented Jun 20, 2023

@redmitry Yes, IMO also a good option (actually my preferred one). But the form in which to do this is not clear because there are some open questions:

  • Can there be several scopes (and why would there)?
  • Does the scope apply to the object in the model (i.e. the filter matches a parameter there) or to the entry type requested? Not the same; the model definition would just be a hint for where this is processed; the entry type use could mean that the filter could be applied somewhere else to filter the entry type indirectly (e.g. variants for a biosample parameter).
  • Do we want to hint at the precise scope, i.e. the parameter in the entry type's schema in the default model? This seems like a good option but this could be confusing for implementers and also would be problematic for alternative models etc.
  • Also: Do we need/want a scope in the filter's query part at all? One instance comes to my mind (DUO codes) but this may be a bit constructed...

For me in summary I'd rather see a flexible use where the terms in the beaconFilteringTermsResults have some (optional) information about the scope they apply to, potentially w/ parameter (e.g. biosamples.histologicalDiagnosis) but more for informational purposes. So a solution like above would be one option.

There had also been some discussion at #79 (comment) (w/o final resolution).

@costero-e
Copy link
Collaborator

costero-e commented Jun 20, 2023

I also like the approach that @redmitry has done with filtering terms. I think it would fit better the idea of filters we are applying in beacon and would help the user to quicker find what can he filter and where. Just one observation, wouldn't the scope be defined better as an object than an array?, like this:

"scope": {
    "type": "object",
    "items": {
        "type": "string"
    }
}

On the other hand, on @mbaudis observations, I agree that giving a scope hint can be problematic if we think in alternative models, but as you said we can add it to be optional, not required. But I can think of some examples (in fact we have them in our reference implementation) where an ontology applies to two different scopes, like ethnicities/diseases/sex ontologies in individuals and cohorts.

@mbaudis
Copy link
Member Author

mbaudis commented Jun 20, 2023

@costero-e The

type: array
items: string

...is correct if you reference the entity names. If you reference ontologies for the entities you'd use items: object or better

items:
  $ref: "../common..."

@mbaudis
Copy link
Member Author

mbaudis commented Jun 20, 2023

Good note about the examples (sex, ethnicities...). Another argument for having entities in the filteringTerms - but not for queries since you'd either query cohorts or individuals. I think the use of query aggregation here across entry types (variants for a certain diagnosis, biosamples from male subjects ...) does not interfere for these examples since there is a difference between collection schemas and data records.

@redmitry
Copy link
Collaborator

  • Can there be several scopes (and why would there)?

The filters are to be applied to particular entry types, so IMO it's reasonable to enumerate (limit) their scopes.

  • Does the scope apply to the object in the model (i.e. the filter matches a parameter there) or to the entry type requested?

IMO, the scope defines the entry type it applies, not the parameter. The parameter in particular entry type may differ in different implementations (mongo, sql, omop, etc.).
The reference implementation uses "scope" attribute in the filtering_terms to provide "mongodb" parameter. issue #79
Again, at the moment, there is no "scope" parameter in filtering terms.
Alphanumeric filters may require more complex solutions than just path to the property. For instance, here is java implementation filter:

{
    "id": "LOINC:3141-9",
    "label": "Weight",
    "type": "alphanumeric",
    "query": "{'measures': {'$elemMatch': {'assayCode.id': '$$id', 'measurementValue.value': {$$operator: $$value}}}}}"
}

Note, that "query" is filtered out when exposed by "filtering_terms" (we just use the same format for configuration).

Not the same; the model definition would just be a hint for where this is processed; the entry type use could mean that the filter could be applied somewhere else to filter the entry type indirectly (e.g. variants for a biosample parameter).

Well, the idea that "scope" defined in the query corresponds to "scope"s defined in filtering_terms.

  • Do we want to hint at the precise scope, i.e. the parameter in the entry type's schema in the default model? This seems like a good option but this could be confusing for implementers and also would be problematic for alternative models etc.

Parameters in the entry should be out of scope, IMO. Beacon implementation may generate beacons dynamically and use other query languages or whatever.

  • Also: Do we need/want a scope in the filter's query part at all? One instance comes to my mind (DUO codes) but this may be a bit constructed...

This could provide the choice in a case of complex query. Imagine we have say "age" filter which may be applied to two entities (e.g. biosamples and individuals).

@costero-e
Copy link
Collaborator

@costero-e The

type: array
items: string

...is correct if you reference the entity names. If you reference ontologies for the entities you'd use items: object or better

items:
  $ref: "../common..."

Sorry, I was thinking in the whole "individuals" response as the scope and not the type of scope inside the filtering terms. So yes, it is an array and items are the strings:

"scope": ["individuals", "biosamples"]

@jrambla
Copy link
Contributor

jrambla commented Jun 20, 2023

We are having several parallel discussions here, which, usually, ends up in some threads not being closed.
I would suggest to split the discussion in three different ones:

  1. Keeping or not the filteringTerms.yaml inside every entry type folder
  2. Making listing the filtering terms vs using filtering terms more homogeneous
  3. Having multiscope in the filterTerm definition

Makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants