Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering across endpoint boundaries #78

Closed
DavidCroftDKFZ opened this issue May 2, 2023 · 6 comments
Closed

Filtering across endpoint boundaries #78

DavidCroftDKFZ opened this issue May 2, 2023 · 6 comments

Comments

@DavidCroftDKFZ
Copy link

Hi,

I have installed the Beacon 2 RI and imported the Cineca test dataset from the GUI RI. I can query the endpoints "individuals", "biosamples" and "g_variants" and I can do filtering.

I would like to be able to find a count of all genetic variants for which there is a biosample of type "blood".

The only way that I have been able to think of is to run a POST request on the endpoint:

http://beacon:5050/api/biosamples/

with the filter:

"filters" : [ {
"id" : "UBERON:0000178"
} ],

Then I would have to extract the biosample IDs from the results and run the following for every returned ID:

http://beacon:5050/api/biosamples/HG00657/g_variants/

(HG00657 is an example ID)

It sounds like it would be slow and anyway, I don't want to have to pull the sample IDs to my server, I would rather they stay on site, for data protection reasons.

Is there some kind of shortcut notation that I could use to get what I want? E.g. something like:

http://beacon:5050/api/biosamples/*/g_variants/

...with the above filter?

Regards,

David Croft.

@mbaudis
Copy link
Member

mbaudis commented May 2, 2023

@DavidCroftDKFZ While we don't use the RI (rather bycon) the method you describe (record-level retrieval & one-by-one querying w/ count granularity) is the only scenario envisioned out-of-the-box1.

IMO Beacon by itself shouldn't support such wildcard approaches (there are a number of reasons) but you should feel free to add your own extensions as long as they don't collide w/ the spec. A simple & conform way would be to process server-side and deliver a handover containing your special read-out, if this is a recurring scenario (we do a number of special handovers including CNV histogram links etc.).

That being said: One of the next areas of Beacon protocol development will be aggregated results so if there is enough justification for dedicated scenarios (@mcourtot?)...

Footnotes

  1. Well, I don't know if the RI supports this but on Progenetix you could do it also the other way round - querying variants by the filter but then you'd have to deparse those for the biosample id values)

@DavidCroftDKFZ
Copy link
Author

Re. wildcards, I only used the "*" notation to give an idea of what I was trying to achieve. Essentially what I am requesting is a JOIN operation that works between endpoints (assuming that the relevant IDs are shared by both, of course). If this can be done without widcards, great, I would be happy with that.

@jrambla
Copy link
Contributor

jrambla commented Jun 13, 2023

The query you are requiring can be expressed within the current spec, also tricky to implement.
The principle is that you must leverage both REST and scoped filters, and then coding for solving such query in the implementation.

The REST principle: the URL determines which entry type/resource you are returning. In your case, given that you want to count genomic variations, that must be the endpoint.

But as you are not interested in all genomic variations but only in the ones coming from blood samples, you must include that as a scoped filter, like that:

query:
  filters:
    - id: UBERON:0000178
      scope: biosamples.sampleOriginType

The tricky part is your backend understanding which are the necessary steps to solve that query. The solution will depend a lot on the schema on you backend.

@mbaudis
Copy link
Member

mbaudis commented Jun 13, 2023

@jrambla I guess I misread @DavidCroftDKFZ 's question as implying a query across multiple datasets. Obviously

http://beacon:5050/api/g_variants/?filters=UBERON:0000178

... should work since the anatomical location should be scoped against biosamples automatically (and the information if it exists and the scoping should be available through the /filtering_terms/ endpoint).

In fact http://progenetix.org/beacon/g_variants/?filters=UBERON:0000178 would work ... but dies due to a time-out since counting over some 10^6 variants I guess :-(

@mbaudis
Copy link
Member

mbaudis commented Jun 14, 2023

query:
  filters:
    - id: UBERON:0000178
      scope: biosamples.sampleOriginType

@jrambla The scope here should be just biosamples; but even that is not necessary on the query side since in principle the /fitering_terms endpoint informs you about the terms and their scope.

Edge case: There could be implementations which use the same terms for different scopes (e.g. maybe you want to retrieve a normal tissue biosample for an individual with a renal cell carcinoma and not the tumor sample - in which case the e.g. NCIT:C4033 has to be scoped to the individual or one uses "collision free" terms...).

But apart from that for the information endpoint we may think about introducing a defaultModelScope (YMMV) parameter which optionally details where a filter would match in the default model.

Pinging @tb143 ...

@mbaudis
Copy link
Member

mbaudis commented Mar 27, 2024

@DavidCroftDKFZ This has now been addressed in #118 which introduces an informational parameter for scopes. Therefore a Beacon w/ full or partial query aggregation (like Progenetix' bycon ) can now indicate which entities the filter value would affect (in the case of bycon these would be all; e.g. an diagnostic filter would affect variants, analyses, biosamples, runs, individuals ...). Now, this will depend on the implementation.

      scopes:
        description: >- 
          Entry types affected by this filter.
        examples: 
          - 
            - individual 
            - biosample 
            - analysis 
            - run 
            - genomicVariation 
          - 
            - biosample
        type: array
        items:
          type: string

Closing this; extensions will be further evaluated by Beacon Filter Scouts (e.g. defaultModelScope or modelPath...).

@mbaudis mbaudis closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants