Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process unification #310 #348

Draft
wants to merge 10 commits into
base: draft
Choose a base branch
from
Draft

Process unification #310 #348

wants to merge 10 commits into from

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Dec 7, 2020

This is a first draft for #310 that tries to unify /processes and /process_graphs.

Sharing might be a separate PR.

@m-mohr m-mohr added process discovery and profile discovery process graph management feedback required breaking Breaking changes, requires a major-version (2.0.0 for example) labels Dec 7, 2020
@m-mohr m-mohr added this to the future milestone Dec 7, 2020
@m-mohr m-mohr self-assigned this Dec 7, 2020
@m-mohr m-mohr force-pushed the process-unification branch 4 times, most recently from 271042c to 9b4e914 Compare December 7, 2020 16:10
Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(initial review)

openapi.yaml Show resolved Hide resolved
openapi.yaml Outdated
For ease of use, it is NOT RECOMMENDED to use long randomly generated
identifiers. More readable user identifiers like `john_doe` support a
better user-experience as the user identifier is used in URIs for shared
processes, e.g. `https://example.org/api/v1.0/processes/@john_doe/my_ndvi`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that john_doe is nicer, but this recommendation does not play well with what OIDC gives us. For example google returns user ids like us-a44d63b6-e090-6059-81d7-cbe7afeff6ce and microsoft: nIrHDS4rhk4ri738TRhtLHXdoUQ6OxZo9Ob0AS3vTig

there is a piece of the puzzle missing here

Copy link
Member Author

@m-mohr m-mohr Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no user name / id to derive such a slug from, what to use instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I'm not even sure the returned userid is stable, e.g. if a user registers a new client id to work with, maybe an OIDC providers could bump/rehash the userid?

Copy link
Member Author

@m-mohr m-mohr Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would expect that a back-end assigns (or let the user choose) a separate user-id in addition to the external user-id. That is how I've seen it implemented as in most all cases you need to store additional user data anyway.

openapi.yaml Show resolved Hide resolved
description: |-
Lists all user-defined processes (process graphs) of the
Redirects to all user-defined processes of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add some kind of deprecation notice to the description of these endpoints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least it should probably describe what to use/do instead. But I'm still not sure whether to use this redirect behavior (breaking) or leave /process_graphs as it is (basically as an alias, but non-breaking).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you mean that the redirect is breaking? That (some) clients don't properly handle a redirect on POST/PUT/PATCH/DELETE?

Copy link
Member Author

@m-mohr m-mohr Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I (have to) assume that not all clients would handle it properly. But if we test JS, R and Python and all work flawlessly, it's probably okay to call it non-breaking ;-)

The namespace `backend` is an alias for predefined processes.

Back-end implementations MAY implement other namespaces that don't
conflict with any of the namespaces mentioned above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see discussion at #310 (comment)

I think the namespace "format" should be more flexible/generic than: @ is for per-user namespaces

Copy link
Member Author

@m-mohr m-mohr Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I disagree in #310 (comment) ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime I'm fine with more flexibility, but on the other hand we should likely restrict the allowed characters:
#478

@m-mohr
Copy link
Member Author

m-mohr commented May 7, 2021

Seems not stable enough for 1.1, so moving to 1.2. Would be good to have an implementation first, too.

@m-mohr m-mohr linked an issue May 12, 2021 that may be closed by this pull request
@@ -1617,6 +1632,187 @@ paths:
$ref: '#/components/responses/client_error_auth'
5XX:
$ref: '#/components/responses/server_error'
'/processes/{namespace}':
get:
summary: List all user-defined processes in a namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation of /processes/{namespace} and /processes/{namespace}/{process_id} talk about "user-defined processes in a "namespace", but listing "predefined" processes should also work, right?

e.g. GET /processes/backend returns the same as GET /processes
and GET /process/backend/filter_temporal would return the metadata of filter_temporal process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thinking was that there would be duplication as you've pointed out. You usually get all details from /processes for pre-defined processes so that this "extension" is mostly for user-defined processes (and was mostly copied from /process_graphs). So we can discuss whether we should remove the "user-defined" here. The backend namespace is somewhat different though as it is read-only...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that de duplication between GET /processes/backend and GET /processes is that much of an issue, especially because "backend" can be considered to be a default namespace. So yes, I would argue that the "user-defined" can be dropped here.

Also, a back-end is also free to define custom namespaces, and these could also contain pre-defined, non-user-defined, "read only", possibly proprietary processes.

Copy link
Member Author

@m-mohr m-mohr May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm fine with that. We can add additional wording that clarifies any specifics for user-defined processes. Edit: It's not that simple, see below...

Copy link
Member Author

@m-mohr m-mohr May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking into the OpenAPI file made me remember that user-defined processes and pre-defined processes have a different schema. So indeed the endpoints were only meant to support user-defined processes so far.

User-defined processes for example require a process_graph, but that's not possible for predefined processes.
On the other hand, pre-defined processes require e.g. parameters and return values while this is optional for user-defined processes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to split up into two separate endpoints to make sure the schemas can be applied correctly
Or we don't allow exposing this endpoint as it's already exposed via /processes.

Both options are equally bad for us, VITO, as we are already using a non-"backend" namespace containing predefined proprietary processes (without a "process_graph"), which would be invalid according to both of these options.

Copy link
Member Author

@m-mohr m-mohr May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... maybe we can add a discriminator to GET /processes/{namespace}. If it contains type: user: true (or something similar) in the response it applies the user schema to the processes array, otherwise the predefined schema. That would also help clients to know whether they can do non-GET requests on that namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the current problem with "predefined" vs "user-defined", is that we are lumping together a couple of process concepts, for example:

predefined user defined
live in default namespace live in "user" namespace
implemented "natively" by backend implemented through openEO process graph
has no "process_graph" field in metadata has a "process_graph" field in metadata
public (by default?) private (by default?)
no public API to add/update/remove created and managed though openEO endpoints
parameters and return values must be declared parameters and return values are optional

By sticking to this binary division, each with own "schema", we probably make it hard for ourselves to create a clean API in the long term.
For example, in VITO backend we already have processes that mix properties from both columns, e.g. private, natively implemented processes that live in a namespace that is neither default or per-user. Another example is defining predefined processes through a "process_graph", instead of "natively" (e.g. "ndvi" or "evi").

Copy link
Member Author

@m-mohr m-mohr May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember that we are just moving things around here. This is to move the endpoints to /processes/... for unification and to prepare for v2, but we have to stay compliant with the API v1.x line for now. We can't change a lot wrt the schemas for example as that usually is a breaking change. So until we go for API v2 we may need to live with some compromises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I understand, I was just reflecting on the tension between the constraints of the v1.x API and what we are doing in VITO backend (or want to do) with custom namespaces

soxofaan added a commit to Open-EO/openeo-python-driver that referenced this pull request May 12, 2021
@m-mohr
Copy link
Member Author

m-mohr commented Aug 19, 2021

Some additional thoughts on namespaces:

  1. The namespaces list in /processes just lists the identifiers. For some use-cases it could be useful that this is expansible, e.g. with a title, a description, a boolean loadByDefault (I guess mostly for the Web Editor?) or a type (e.g. predefined or user-defined)
  2. Requesting the processes for a namespace could also include the additional metadata for the namespace as mentioned above (excluding loadByDefault I guess).

Lastly, I'm thinking to not merge this into the "core" API, but instead, make this a separate extension. Then this would be a pure addition and the /process_graphs would stay as they are. Final unification would then be done in v2.

@soxofaan
Copy link
Member

Final unification would then be done in v2.

I'm afraid I have to agree 😄 .
It's probably not feasible to achieve unification in a clean, non-breaking way under v1

@m-mohr m-mohr modified the milestones: 1.2.0, 1.3.0 Nov 29, 2021
@m-mohr m-mohr removed their assignment Dec 1, 2021

If multiple processes with the same identifier exist, Clients SHOULD
inform the user that it's recommended to select a namespace.
process_namespace:
type: string
pattern: ^@?[\w\-\.~]+$
Copy link
Member Author

@m-mohr m-mohr Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For VITO we'd need to add a double colon here (i.e. for the u:asd replacement for @GreatEmerald )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes, requires a major-version (2.0.0 for example) feedback required process discovery and profile discovery process graph management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unification of /processes and /process_graphs metadata of a single (predefined) process
2 participants