Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write up background on cart service based on what was found by talking to Elixir, Terra, U. Chicago, etc. #100

Open
briandoconnor opened this issue May 4, 2020 · 4 comments
Assignees
Labels

Comments

@briandoconnor
Copy link
Contributor

briandoconnor commented May 4, 2020

First steps:

  • problem statements from projects
  • use cases and constraints
  • give some examples
  • hand off to DURI
@cdvoisin
Copy link
Collaborator

cdvoisin commented May 9, 2020

Kurt Rodarmer of NCBI has stepped forward to own this issue. Thank you Kurt!

@cdvoisin cdvoisin assigned cdvoisin and kwrodarmer and unassigned cdvoisin May 11, 2020
@rishidev
Copy link

rishidev commented Jun 15, 2020

@mbarkley to send web sequence diagrams to Kurt

@mbarkley
Copy link
Collaborator

mbarkley commented Jul 9, 2020

@kwrodarmer apologies for taking so long to follow up on this, but here is a link to an image of the web sequence diagram of the SRA token process (or at least my understanding of it after our discussions):
https://drive.google.com/file/d/1NMwxCKir9jca4hdcw-ku1Uy15E2uWHAs/view?usp=sharing

That image was generated at sequencediagram.org using this text:

title SRA (rough sketch)

actor Researcher
participant Passport Endpoint
participant Run Selector
participant Cart Service
#note over Cart: has permissions
participant SDL
participant Compute Env
participant Signed URL Redirector
note over Compute Env: WES, VM, etc


Researcher->Passport Endpoint: Do authentication right away to get user passport
note over Passport Endpoint: This part is GA4GH AAI/Passport flow
Researcher->Run Selector: User does faceted selection
note over Researcher: Token management is happening in web browser
Researcher->Cart Service: Send selection and passport to mint cart token
note over Cart Service: Produces newly signed/minted token with copy of SRA permissions
note over Cart Service: It can down-scope to particular visas required for the given run-selection
Cart Service->Researcher: Receive down-scoped token (or DBGAP passport that is not downscoped)

note over Researcher: There is a "simple" exit where the down-scoped token\n is downloaded to the user machine,\nused as a bearer token to download data

note over Researcher: Start simple case
Researcher->SDL: Resolve access on cloud (w/ cart token)
note over Researcher: End simple case

note over Researcher: Start "managed compute" case
Researcher->Compute Env: Start compute environment
Researcher->Cart Service: Rebind user cart token to "bound cart token"
Compute Env->SDL: Request to SDL with bound cart token
SDL->Compute Env: Respond with URL
note over SDL: Can return "naked URLs" (direct to object servers such as NCBI)\nwith no other auth tokens, or the object is in cloud. Cloud objects\n have three cases: open access, user-pays, and controlled access.\nFor the first two cases, urls to the cloud objects are given.\nFor the controlled access case, a signed URL to "Signed URL Redirector"\nservice is given. That service will redirect valid inbound HTTP requests\nto the actual resource (via a cloud-provider signed URL).

@kwrodarmer
Copy link
Collaborator

I'd like to start with a conceptual overview of how NIH sees the cart concept.

First, a cart is a type of dataset. The objective is to have a container object whose contents may be created and managed and used wherever a dataset would be used within a workflow.

Second, there is a notion that the dataset might be bound with authorizations such that the cart object itself becomes standalone. This is the type of cart implemented by the SRA, and has plusses and minuses. I mention it now because the ability to carry authorization has an affect on the representation of a cart.

The simplest idea of a cart as a selection object is to hold a set of object descriptors. The GA4GH notion of object descriptor is a DRS id. Since a DRS id can be either an object or a bundle of objects (yes, and bundles), a cart containing DRS ids can be as explicit or expanded as appropriate.

A cart object in theory has a limitless upper bound on the number of items it may contain. That said, engineering practice requires us to impose some limits within a standard. There are limits that may be visible to the end user and others that are not visible. But size is a definite engineering concern. A cart object - whatever its form - should be assumed to make use of POST methods during transport. Additionally, we should consider that a cart can be paginated, which implies that multiple objects would need a common id for joins and some spec for indicating the subset they represent.

In concept, a cart could then be as simple as a JSON object with an object id, an optional pagination spec, and a list of DRS ids. If the latter are not universal enough to represent all contents, then other URI schemes can be incorporated. This is particularly the case for a representation of a non-deterministic query which at present is not representable under DRS.

We will want to add a grouping facility for factorization because size really is a concern here, and if it is possible to factor out common substrings, we will probably be happier than not. Compression is another possibility. Both JSON and gzip suffer from resistance to streaming, and both benefit from the ability to paginate.

I suggest that the basic concept of cart be kept separate from passports and visas. When used, they will be accompanied by a passport with visas that carry the authorization needed to access the blobs they identify. That said, the cart concept can represent a very concise and proper definition of a researcher's needs and intentions, can be easy and intuitive to build, and allows automation to map from the dataset back to the visas needed for their access. The passport and visa token generation system can be augmented to take a cart as a downscoping indicator so that the resulting auth tokens are minimized.

Finally, it is possible and in some cases advantageous to bind cart contents into an auth token. The SRA does exactly this already by downscoping the visas to the minimum set, and then recording the exact ids of accessible objects all in a single token. This token carries both dbGaP authorizations and an explicit set of object designations that come from the intersection between the selection in the cart with the authorized datasets in the visas. The result is a precisely scoped auth token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants