Process to run EO Application Packages (CWL) #507

clausmichele · 2024-05-23T08:55:48Z

run_ogc_application_package

Context

For the InterTwin project (and soon others), we would like to run an OGC Application Package inside an openEO process graph. The documentation for OGC Application Package is here: https://docs.ogc.org/bp/20-089r1.html
We see it as a process similar to run_udf.

Summary

Description

Parameters

`data`

Optional: yes

Description

The data to be passed to the OGC Application Package execution engine. Optional since the input data could be already defined in the CWL file and therefore it wouldn't need any other inputs.

Data Type

Datacube

`cwl`

Optional: no

Description

Currently it's a YAML file. Either we pass is as pure text/string like for UDFs, or we pass an URL to it and the back-end loads it.
The schema could be the same as for the udf parameter of run_udf with some changes.

Data Type

string

`cwl_params`

Optional: no

Description

It's either a YAML or JSON file. Again, it could be passed in the same ways described for the previous one.

Data Type

string

Return Value

Description

The result should be made available as a STAC object, so a JSON string. In this way, in the back-end it's possible to continue the process graph using load_stac.

Data Type

string

Links to additional resources (optional)

OGC Application Package best practices: https://docs.ogc.org/bp/20-089r1.html

Examples

Currently in development:
interTwin-eu/HyDroForM: Hydrological Drought Forecasting Model with HydroMT and Wflow (github.com)

cwltool --outdir ./wflow-output --no-read-only --no-match-user wflow-exp-run.cwl#run-wflow params-exp-wflow.yaml

OR something like this:

(very experimental, uses sapporo service: sapporo-wes/sapporo-service: A standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification. (github.com) )

curl -X POST http://localhost:1122/runs \
    -H "Content-Type: multipart/form-data" \
    -F "workflow_params=[https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/params.json;type=application/json"](https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/params.json;type=application/json) \
    -F "workflow_type=CWL" \
    -F "workflow_type_version=v1.2" \
    -F "workflow_engine=cwltool" \
    -F "workflow_url=[https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/hydromt-build.cwl"](https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/hydromt-build.cwl) \
    -F "workflow_attachment=[https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/hydromt-build.cwl;type=application/octet-stream"](https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/hydromt-build.cwl;type=application/octet-stream) \
    -F "workflow_attachment=[https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/update-config.cwl;type=application/octet-stream"](https://raw.githubusercontent.com/interTwin-eu/HyDroForM/experimental/experimental/hydromt/cwl/update-config.cwl;type=application/octet-stream)

I put in cc the people from Eurac working on this @jzvolensky @iacopoff @aljacob

And I am aware VITO is also interested: @jdries @soxofaan
EODC @christophreimer

The text was updated successfully, but these errors were encountered:

jdries · 2024-05-24T06:48:01Z

+1 we will probably start implementation work on this still in 2024 (I hope)
For cwl_params, I'm wondering if we can find a solution that makes it look more like how other openEO processes specify parameters?
One idea could be that we simply interpret all extra process arguments as cwl parameters.

The other difficult thing is how data goes in and out. STAC is for sure the solution, but it needs constraints to be usable.
Also thinking if it is possible to avoid constructions where process graphs have to be very explicit about converting datacube to stac, running AP, and then reading back from STAC, or if we can have (a variant?) of run_ogc_application_package that simply works for rastercube input/output.

m-mohr · 2024-05-30T08:09:45Z

That sounds pretty reasonable. The return value should probably be a data cube (or the new stac subtype, see #485).

Here's a reference to an old PR, which had similar aims and has some discussion already: #332

One idea could be that we simply interpret all extra process arguments as cwl parameters.

That's not a thing in openEO, primarily because not all programming language have a construct such as kwargs in Python.

m-mohr · 2024-07-03T12:42:32Z

I was just wondering whether CWL could just be another UDF runtime and whether we could use run_udf? @clausmichele

clausmichele · 2024-07-03T13:58:01Z

Maybe @jzvolensky can help, he's our OGC AP expert. I guess in this case we can't pass a single code block which contains everything, definition and input parameters to run an AP?

jzvolensky · 2024-07-03T14:19:08Z

@clausmichele I am not sure how that would work with the ADES. Since the CWL processes are stored in the ADES I suppose they could be read in a UDF and then you provide the input parameters in the UDF and then send the processing request to ADES? Maybe this is something we can look at/think about.

m-mohr · 2024-07-04T11:21:20Z

What is ADES in our context here?

I did assume that you'd specify a CWL file and there happened no interaction before to store the CWL.

jzvolensky · 2024-07-04T11:59:36Z

Sorry, ADES is the Application Deployment and Execution Service from the EOEPCA project. Basically a CWL execution engine which also supports managing CWLs (deploy, undeploy etc.). Our idea is to plug this into OpenEO so that with a process or possibly a UDF? we can then execute Application Packages. In this way we can have a set of predefined processes available to the user, or possibly allow the user to provide their own.

m-mohr · 2024-07-04T12:55:55Z

The specification should be independant of the implementation. So ADES might be a data point, but we should probably focus on the underlying specification (i.e. OGC API - Processes - Part 2/3). Plugging that in makes sense, but in the end a CWL could also be just a specific "language" to express UDFs in, similar to Python or R.

jzvolensky · 2024-07-08T08:55:20Z

I was just wondering whether CWL could just be another UDF runtime and whether we could use run_udf? @clausmichele

Hello, so I looked at the run_udf process spec, I guess this could work. Just to understand it correctly, you would for example run_udf and pass the CWL (file, url, whatever), as well as inputs (yaml or json) for the CWL with a runtime set cwl1.2 and then the runtime would do whatever it needs to do in the backend to execute and return result?

m-mohr · 2024-07-08T10:26:20Z

Yeah. If we are reusing run_udf instead of a new process, it could look as follows in a process graph:

{
  process_id: "run_udf",
  arguments: {
    udf: "... CWL as YAML or URL or string ...",
    runtime: "cwl",
    version: "1.2", // could be omitted as it's the default version, see below
    context: {
      cwl_param1: true,
      cwl_param2: 99
    }
  }
}

While GET /udf_runtimes lists:

{
  title: "EO Application Packages (CWL)",
  type: "language",
  default: "1.2",
  versions: {
    "1.2": {
      libraries: { ... } // not sure about this entry. I guess it could pre-loaded docker images or so?
    }
  }
}

It's just an idea that doesn't need an explicit process. If people think it would make sense to have a separate process, we can also discuss that. But right now I don't see an explicit reason why that might be better. Please let me know if you have any reasons in mind.

Somewhat related issue: #515

Also, run_udf is usually meant to be executed in datacube processes such as reduce_dimension. This would not be the case for EO Application packages I guess, which is somewhat against the best practice of UDFs. It's somewhat unclear how a mapping from the EO Application Packages and the openEO data types can be achieved and communicated to users.

Related process: run_udf_externally

jzvolensky · 2024-07-08T13:00:21Z

Okay, the first part looks really neat with defining the workflow and inputs.

in the second GET /udf_runtimes Do you mean just to list the available docker images? Unless we extract them from the CWLs, this is not information which we/user needs to define, it is defined in the CWL, and it doesn't really provide any added value to store this, I think.

The last paragraph is interesting. I mean the Application Packages are fully standalone applications right. From this point of view a new process makes sense, because the application and execution of it is outside of your traditional process graph scope. All that we do is bind it together with the rest of openeo processes chain using a process graph (however in theory we don't need to use any other process to use it, so it really can be a standalone process).

I do like the UDFs idea and if the UDF can support this with some minor best practices update or a general UDF use case extension then that is good, I suppose.

m-mohr · 2024-07-18T11:54:00Z

Notes from the meeting today:

Input/Output in CWL:

type: File/Directory, format: STAC (tbc with OGC)
- type: File, format: stac:item / stac:catalog / stac:collection / stac:itemcollection / stac:any
For STAC: The same requirements as for load_stac (which fields etc.)

Ways of interacting with CWL in openEO:

Pre-defined AP from backend
- openEO processes (not exposed as CWL)

Pre-deployment

link relation type ...? (tbc with OGC) in GET / to link to ADES
Management through OGC API - Processes outside of the openEO API tree

Processes could work as follows:

{
  process_id: "run_ogcapi",
  arguments: {
    data: ...,
    id: "my-ap",
    inputs: {
      cwl_param1: true,
      cwl_param2: 99
    }
  }
}

or

{
  process_id: "run_ogcapi_externally",
  arguments: {
    data: ...,
    url: "https://processes.otherprovider.com",
    id: "my-ap", -> https://processes.otherprovider.com/proceesses/my-ap
    inputs: {
      cwl_param1: true,
      cwl_param2: 99
    }
  }
}

User-provided CWL in a process graph (or via URL/file path)

One runtime in GET /udf_runtimes as a language (we should recommend a name, e.g. CWL or EOAP?, tbc with OGC)

process for execution: run_udf

{
  process_id: "run_udf",
  arguments: {
    data: ...,
    udf: "... CWL as YAML/... or URL or string ...",
    runtime: "cwl",
    version: "1.2",
    context: {
      cwl_param1: true,
      cwl_param2: 99
    }
  }
}

m-mohr · 2024-07-18T12:57:18Z

The open questions to OGC have been posted here: opengeospatial/ogcapi-processes#428

m-mohr · 2024-08-23T14:22:19Z

See PR #520 for a proposal, please discuss further issues in the PR.

clausmichele added the new process label May 23, 2024

jdries mentioned this issue Jun 14, 2024

[EPIC] OGC Application package support Open-EO/openeo-geopyspark-driver#803

Open

m-mohr changed the title ~~run_ogc_application_package~~ Process to run EO Application Packages (CWL) Jul 8, 2024

m-mohr mentioned this issue Jul 18, 2024

EO Application packages in openEO opengeospatial/ogcapi-processes#428

Open

m-mohr self-assigned this Jul 18, 2024

m-mohr added a commit that referenced this issue Aug 23, 2024

Implementation guidelines for EOAP #507

6322960

m-mohr mentioned this issue Aug 23, 2024

Guidelines and processes to run OGC API - Processes / CWL / EOAP #520

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process to run EO Application Packages (CWL) #507

Process to run EO Application Packages (CWL) #507

clausmichele commented May 23, 2024 •

edited

Loading

jdries commented May 24, 2024

m-mohr commented May 30, 2024 •

edited

Loading

m-mohr commented Jul 3, 2024

clausmichele commented Jul 3, 2024

jzvolensky commented Jul 3, 2024

m-mohr commented Jul 4, 2024

jzvolensky commented Jul 4, 2024

m-mohr commented Jul 4, 2024 •

edited

Loading

jzvolensky commented Jul 8, 2024

m-mohr commented Jul 8, 2024 •

edited

Loading

jzvolensky commented Jul 8, 2024

m-mohr commented Jul 18, 2024 •

edited

Loading

m-mohr commented Jul 18, 2024

m-mohr commented Aug 23, 2024

Process to run EO Application Packages (CWL) #507

Process to run EO Application Packages (CWL) #507

Comments

clausmichele commented May 23, 2024 • edited Loading

Context

Summary

Description

Parameters

data

Description

Data Type

cwl

Description

Data Type

cwl_params

Description

Data Type

Return Value

Description

Data Type

Links to additional resources (optional)

Examples

jdries commented May 24, 2024

m-mohr commented May 30, 2024 • edited Loading

m-mohr commented Jul 3, 2024

clausmichele commented Jul 3, 2024

jzvolensky commented Jul 3, 2024

m-mohr commented Jul 4, 2024

jzvolensky commented Jul 4, 2024

m-mohr commented Jul 4, 2024 • edited Loading

jzvolensky commented Jul 8, 2024

m-mohr commented Jul 8, 2024 • edited Loading

jzvolensky commented Jul 8, 2024

m-mohr commented Jul 18, 2024 • edited Loading

m-mohr commented Jul 18, 2024

m-mohr commented Aug 23, 2024

clausmichele commented May 23, 2024 •

edited

Loading

`data`

`cwl`

`cwl_params`

m-mohr commented May 30, 2024 •

edited

Loading

m-mohr commented Jul 4, 2024 •

edited

Loading

m-mohr commented Jul 8, 2024 •

edited

Loading

m-mohr commented Jul 18, 2024 •

edited

Loading