Unification of /processes and /process_graphs #310

soxofaan · 2020-07-09T09:36:22Z

This is follow up issue of #305/#309, which added a namespace property to process graph nodes to identify whether the process_id property refers to a predefined process or a user defined process.

Possible next steps:

merge GET /processes and GET /process_graphs and work with an additional namespace request parameter to select PDP or UDP (or even other namespaces)
unify all /processes and /process_graphs requests to something like /processes/{namespace}/{process_graph_id}

The text was updated successfully, but these errors were encountered:

jdries · 2020-10-08T13:46:50Z

This issue is becoming relevant again for us: we'd like to allow users to expose their process graphs. This is supported by the namespacing, but it seems they are not yet discoverable?
I would try to avoid breaking the API for this, at least not in a 1.1 version. So certainly the /processes endpoint should not change in an incompatible way.

So I'm proposing something like:
/namespaces -> list all namespaces
/namespaces/{namespace} -> list all processes in a namespace
/namespaces/{namespace}/{process_graph_id} -> list process metadata
This resembles the GitHub approach: our namespaces would be called an 'organization' in GitHub. Users could store process graphs in multiple organizations (namespaces).

By not reusing /processes or / process_graphs we are fully backwards compatible. I'm not yet sure if 'namespaces' is the best name?

jdries · 2020-10-09T06:01:40Z

Rethinking this: the use case of sharing user defined processes is also what the 'marketplace' in openEO platform is supposed to do.
So we could also consider defining an entirely separate microservice, that can move at its own pace, so we can move at a faster pace compared to openEO versions? (Meaning in the beginning, when we are defining and changing such a service.)

aljacob · 2020-10-09T06:52:06Z

What do you mean @jdries with a seperate microservice? Just an arbitrary http service that is able to work with openEO processes or process_graph and store and offer them? I think the namespaces proposal from you, from yesterday was not bad. We can still keep that as an experimental endpoint for now, without a new API release.

Thinking more about it...
I have to say I am also a bit confused by the duality of /processes and /process_graphs now that I have implemented version 1. As process_graphs are anyways always wrapped in a process when send to the back-end for processing, maybe it would actually make sense, to organize also their storage and sharing via processes. Similar to what @soxofaan has proposed originally.

m-mohr · 2020-10-09T08:21:13Z

The /process_graphs endpoint is a legacy from the old 0.4/... days, where this was the endpoint for process graphs. The name currently is also wrong, because it serves user processes, not process graphs only. As the change from process graphs to processes and namespaces came in rather late in 1.0, we didn't really made the big change to completely refactor the endpoints to avoid too many changes. Ultimately, the desire would be to have everything under /processes.

I'm not a big fan of the /namespaces approach, I must admit. This needs quite a lot of round trips to get the actual process information if it's the only way to get the information. Also, namespace is less clear than processes. I like the proposal from @soxofaan. This can likely be done with relatively low work and /process_graphs could just redirect for some time. (Now I'm happy that we never did /processes/:id as this now allows us to use the namespace under /processes as we like).

This would be somewhat backward-compatible, I think:
/processes - pre-defined processes, stays as it is, could be an alias for /processes/backend
/processes/:namespace - all processes under the namespace, could be a user id. Shortcut for the user id could be /processes/user
/processes/:namespace/:processid - GET/DELETE/PUT for individual processes

Redirects:
/process_graphs => /processes/:userid
/process_graphs/:id => /processes/:userid/:id

The only thing missing is a list of namespaces, but that could be retrieved via GET / either as part of the endpoint list or as a separate property.

This way we could have it in a 1.1 release, I think. It could also just be an API extension for now.
At the same time, we should think about sharing though.

A microservice at the EOSC marketplace or the Hub could just implement the same API. We had that earlier for 0.4 at the Hub, but had no time yet to update to 1.0.

jdries · 2020-10-09T09:02:19Z

The proposal to allow /processes/:namespace and /processes/:namespace/:processid is also fine for me.
Would the fact that '/processes/somename' can be both a namespace or a process be problematic? Like in the defining the typing of such a call?

I also created a diagram of a broader marketplace design, to give an idea of what I'm thinking of in terms of additional use cases that I'd like to support:

m-mohr · 2020-10-09T09:05:37Z

Would the fact that '/processes/somename' can be both a namespace or a process be problematic? Like in the defining the typing of such a call?

/process is the only endpoint at the moment - /processes/:processid is not defined in the API and would not be in the proposal! It's only allowing /processes/:namespace. See my comment above.

I also created a diagram of a broader marketplace design

Looks interesting, but is also quite an effort. How's the relationship with the Hub?

jdries · 2020-10-09T09:13:11Z

Indeed, you're right, our backend allows this:
https://openeo.vito.be/openeo/1.0/processes/raster_to_vector
but it's simply not supported API, or a leftover from 0.4.0. That indeed simplifies matters.

It's indeed not entirely trivial, it could be an addition to the Hub, but it could also be an entirely separate component. (It's similar in the sense that it's also central, but it doesn't require links to individual backends, like the hub does. Unless if we would aggregate all user defined processes on all backends, that would also be an option...)
There's also a potential relationship with existing marketplaces, like EOSC, that have to do some of these things already. Or a Zenodo for process graphs...

m-mohr · 2020-10-09T09:16:41Z

@jdries Let's discuss the Hub/Marketplace integration in teams or in a separate issue, it doesn't really fit into this issue.

jdries · 2020-10-09T09:22:22Z

Agreed, if we can do a non-breaking API extension for user defined process discovery, then I can at least proceed.
I believe the only open things is retrieving list of namespaces. As it can get rather long, I would prefer a separate property where you could even allow paging...

m-mohr · 2020-10-09T10:34:54Z

@jdries I could dig into this in Nov and make a proposal. Would that work? When do you need it?

jdries · 2020-10-09T11:45:26Z

Sure, that would work. We're already experimenting with offering user defined processes, but the discovery part might not be crucial in the beginning. If we would need it, we'd probably implement the most recent proposal in this issue.

jdries · 2020-11-17T09:00:18Z

@m-mohr @soxofaan
I wrote some docs on how users can currently publish their processes:
https://github.com/Open-EO/openeo-python-client/blob/master/examples/notebooks/ProcessAsAService.ipynb
As a next step, I'd like to start on some of the things proposed in this issue. Do any of you still see alternative options than the one described above?

m-mohr · 2020-11-17T10:05:01Z

@jdries

Re notebook: Where to comment best on the notebook? I added comment to the commit as I think it would "bloat" this discussion a bit.

Re API for sharing processes: I think it's still the approach best integrated with the current API, so go ahead. Once D35 is submitted (tomorrow?) I can likely also draft an OpenAPI fragment...

Re namespaces: I think the only thing not covered in my proposal was where to get the namespaces. I'd propose to add a namespaces field to /processes. Just an array of namespaces to be used in /processes/:namespace. The latter endpoint then can return processes and namespace metadata if there's any.

soxofaan · 2020-11-17T11:25:31Z

I agree with @m-mohr : this ticket is mostly about new/changed features in a newer (incompatible?) version of the API, while working with stored process graphs should already be possible with API version 1.0

POC: API not fully specified yet, see Open-EO/openeo-api#310

m-mohr · 2020-12-07T11:16:02Z

I'll work on this today and hopefully have a OpenAPI this afternoon. Any recent activities I should consider?

soxofaan · 2020-12-07T11:19:53Z

I added a proof of concept implementation to VITO backend for listing "public" user defined processes under /processes/:namespace where :namespace is something like u:john.

Some things that came up while working on this:

do we predefine the format of namespaces? e.g. for our use case we want to use per user/organization namespaces. I now just implemented user "namespaces" by adding prefix u: to the userid/name, but this is a bit ad hoc of course.
Another issue is that the user/organistation name might have characters that do not play nice in URLs. Likewise: we have been playing with the idea of using (github) urls as "namespace", which also does not play well inside a URL parameter
We are also using a "public" flag for User defined processes: public UDPs are "visible" for other users (or even unauthenticated requests) under the user's namespace, while non-public UDPs are only visible to the user. This is "public" flag is not part of the spec yet, and might be interesting to include in this context

m-mohr · 2020-12-07T11:50:30Z

do we predefine the format of namespaces? e.g. for our use case we want to use per user/organization namespaces. I now just implemented user "namespaces" by adding prefix u: to the userid/name, but this is a bit ad hoc of course.

It's something like a namespace for namespaces. I'm also thinking whether that should be an API thing or not, i.e. the current approach forbids the user_ids user and backend and - if left open for implementation - that should at least be a note in the user_id specification. The best separation would be /processes/<backend|user|...>/{id}, but then there's no additional id for back-ends so a different character would make sense, preferably one that doesn't need encoding. Hmm....

Edit: No, wait, back-end processes are at /processes and then the namespace part could always a plain user-id... Then there are no conflicts by default as namespace = user id

Another issue is that the user/organistation name might have characters that do not play nice in URLs. Likewise: we have been playing with the idea of using (github) urls as "namespace", which also does not play well inside a URL parameter

Although I see the issue with special characters, I don't understand the URLs as namespaces issues. External URLs would not be exposed via the /processes/{namespace} as it's an external process one may load into the process (graph).

We are also using a "public" flag for User defined processes: public UDPs are "visible" for other users (or even unauthenticated requests) under the user's namespace, while non-public UDPs are only visible to the user. This is "public" flag is not part of the spec yet, and might be interesting to include in this context

Sharing is indeed a completely different topic. While public: true/false is the most basic answer for it, it doesn't take into account that there might also be sharing on a per-user basis or so.

m-mohr · 2020-12-07T12:32:43Z

The question to answer is: Is it likely that we introduce other namespaces than the ones for user-defined and pre-defined processes? (What could these be?) Should the user id by default be the namespace, e.g. /processes/soxofaan? This would require some prefixing for other namespaces. Or should users by default have a namespace? I'm thinking about the @ char, which is disallowed in the user id, so for example /processes/@soxofaan.

soxofaan · 2020-12-08T11:37:19Z

Is it likely that we introduce other namespaces than the ones for user-defined and pre-defined processes? (What could these be?)

That's up to the imagination of a backend of course. At VITO, we are for example playing with the idea of per-organization or per-project namespaces. When working with multiple people on the same project it can be useful to have a shared namespace instead of a per-user namespace. Comparable to how you have users and orgs on github. How a backend manages organizations or projects should not be part of the openeo API spec at this point I think.

So in that sense you indeed need some kind of namespace system for namespaces 😛

the @sign is a nice idea for per-user namespaces, but what do you do for other kind of namespaces?

Another solution are namespaces like this:

backend: default namespace of pre-defined processes (alias for null namespace)
user: namespace of authenticated user
user/john: namespace of another user
org/crew123: namespace of an organization
project/doom: namespace of a project
...

The 1.0.0 definition of namespaces is forward compatible with this and it allows backends to invent their own namespace namespace :). It also plays nice when using in URLs (e.g. /processes/user/john and /processes/org/crew123)

soxofaan · 2020-12-08T11:43:22Z

It also plays nice when using in URLs

while reviewing #348 I realized this is actually not automatically true. It might be better to have another separator than /, e.g. :

m-mohr · 2020-12-08T14:31:00Z

At VITO, we are for example playing with the idea of per-organization or per-project namespaces. When working with multiple people on the same project it can be useful to have a shared namespace instead of a per-user namespace.

Additional namespaces and how they can be separated from the API-proposed namespaces is then up to the back-end. I feel like namespaces of namespaces is over the top for the API. Also, it could work like in GitHub where a person is the original owner, but with sharing you can grant other persons access to write/delete etc. your process graphs. That was what I had in mind when writing this proposal (e.g. you could do a PUT /processes/@jdries/abc if Jeroen has granted you write access.

How a backend manages organizations or projects should not be part of the openeo API spec at this point I think.

Yes, indeed. So it's not up to the API decide how they will be made available. Through the user ID restrictions (and potentially the @), there are several ways to do that and not conflict with the user ids.

So in that sense you indeed need some kind of namespace system for namespaces 😛

I don't think so and I'd like to avoid this to not make things overly complex by default.

the @sign is a nice idea for per-user namespaces, but what do you do for other kind of namespaces?

Whatever a back-end can do without conflicts. special character disallowed in user ids + org name would work for example. You can still implement parent-namespace:sub-namespace if you like... the API doesn't restrict you to do that.

The 1.0.0 definition of namespaces is forward compatible with this and it allows backends to invent their own namespace namespace :). It also plays nice when using in URLs (e.g. /processes/user/john and /processes/org/crew123)

Slashes don't work very well as OpenAPI doesn't allow un-encoded slashes in path parameters (and it makes the OpenAPI definitions of the endpoints complex) and with URL encoding it's not as nice as it sounds.

jdries · 2020-12-09T07:29:55Z

I'm wondering if we need to define a separate REST API (microservice) for sharing and discovery of processes? Like if we're opposed to advanced features and namespace rules as part of openEO API, then pulling that into its own spec is perhaps more flexible?
In things like openEO platform, we could then have a single process catalog that contains user defined processes for multiple backends.
Also think of other features:

process search
advanced rights mgmt
versioning
sharing a process, but hiding the implementation

m-mohr · 2020-12-09T09:01:15Z

Yes, I had them all on my extensions list. The issue currently is that OpenAPI is not so extensible as one might think. Once you add things to existing endpoints, OpenAPI tooling breaks (currently also struggling with it in STAC, OGC is also having issues).

Apart from that, what we currently do is basically just a unification of the existing behavior and cleaning-up the /process_graphs mess. There's no real additional functionality in the proposal, except listing the namespaces at /processes. But the way it is now is mire flexible for the future.

Sharing will very likely be separate from the Core API, but as mentioned, I'm struggling a bit with OpenAPI.

m-mohr added this to the future milestone Jul 9, 2020

m-mohr mentioned this issue Jul 14, 2020

Authorization of /validation #314

Closed

m-mohr added the breaking Breaking changes, requires a major-version (2.0.0 for example) label Jul 20, 2020

m-mohr self-assigned this Oct 8, 2020

aljacob added process graph management process discovery and profile discovery processes Process definitions and descriptions labels Oct 9, 2020

m-mohr added extension minor requires a minor-version (x.1.0 for example) labels Oct 9, 2020

soxofaan added a commit to Open-EO/openeo-python-driver that referenced this issue Dec 7, 2020

EP-3609 add listing public user defined processes (POC)

a52a5b8

POC: API not fully specified yet, see Open-EO/openeo-api#310

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

6a96f40

m-mohr linked a pull request Dec 7, 2020 that will close this issue

Process unification #310 #348

Draft

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

2be5892

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

3510d64

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

271042c

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

9b4e914

m-mohr added a commit that referenced this issue Dec 7, 2020

First draft to unify /processes and /process_graphs #310

541e9f9

m-mohr added the platform label Dec 17, 2020

m-mohr modified the milestones: future, 1.1.0 Dec 21, 2020

m-mohr mentioned this issue Apr 8, 2021

2.0.0 clean-up #374

Open

6 tasks

m-mohr linked a pull request Apr 8, 2021 that will close this issue

Process unification #310 #348

Draft

m-mohr modified the milestones: 1.1.0, 1.2.0 May 5, 2021

soxofaan mentioned this issue May 12, 2021

metadata of a single (predefined) process #392

Closed

m-mohr modified the milestones: 1.2.0, 1.3.0 Nov 29, 2021

m-mohr removed their assignment Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unification of /processes and /process_graphs #310

Unification of /processes and /process_graphs #310

soxofaan commented Jul 9, 2020

jdries commented Oct 8, 2020

jdries commented Oct 9, 2020

aljacob commented Oct 9, 2020

m-mohr commented Oct 9, 2020 •

edited

Loading

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020 •

edited

Loading

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020

jdries commented Oct 9, 2020

jdries commented Nov 17, 2020

m-mohr commented Nov 17, 2020 •

edited

Loading

soxofaan commented Nov 17, 2020

m-mohr commented Dec 7, 2020

soxofaan commented Dec 7, 2020

m-mohr commented Dec 7, 2020 •

edited

Loading

m-mohr commented Dec 7, 2020 •

edited

Loading

soxofaan commented Dec 8, 2020

soxofaan commented Dec 8, 2020

m-mohr commented Dec 8, 2020

jdries commented Dec 9, 2020

m-mohr commented Dec 9, 2020

Unification of /processes and /process_graphs #310

Unification of /processes and /process_graphs #310

Comments

soxofaan commented Jul 9, 2020

jdries commented Oct 8, 2020

jdries commented Oct 9, 2020

aljacob commented Oct 9, 2020

m-mohr commented Oct 9, 2020 • edited Loading

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020 • edited Loading

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020

jdries commented Oct 9, 2020

m-mohr commented Oct 9, 2020

jdries commented Oct 9, 2020

jdries commented Nov 17, 2020

m-mohr commented Nov 17, 2020 • edited Loading

soxofaan commented Nov 17, 2020

m-mohr commented Dec 7, 2020

soxofaan commented Dec 7, 2020

m-mohr commented Dec 7, 2020 • edited Loading

m-mohr commented Dec 7, 2020 • edited Loading

soxofaan commented Dec 8, 2020

soxofaan commented Dec 8, 2020

m-mohr commented Dec 8, 2020

jdries commented Dec 9, 2020

m-mohr commented Dec 9, 2020

m-mohr commented Oct 9, 2020 •

edited

Loading

m-mohr commented Oct 9, 2020 •

edited

Loading

m-mohr commented Nov 17, 2020 •

edited

Loading

m-mohr commented Dec 7, 2020 •

edited

Loading

m-mohr commented Dec 7, 2020 •

edited

Loading