Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old and new process graph examples (reduce, band math, labeled data) #43

Closed
jdries opened this issue Feb 26, 2019 · 19 comments
Closed

Old and new process graph examples (reduce, band math, labeled data) #43

jdries opened this issue Feb 26, 2019 · 19 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@jdries
Copy link
Contributor

jdries commented Feb 26, 2019

In the new 0.4.0 process graph example, 'dimension_data' is used in some places, but this is not yet clearly documented (afaik).
I'm also a bit confused about the name, could it be that this name was chosen in the context of the initial confusion around data cubes?

@m-mohr
Copy link
Member

m-mohr commented Feb 26, 2019

It's an example and was not meant to reference to any real processes, I could just name them a, b, c and d. I wanted to show how things work in theory, in some parts also with two callback parameters. The official source of truth are always the process specification files, not the examples. There the callback parameter names are different (usually data).

Edit: I'll go through the API and specify less confusing examples.

@jdries
Copy link
Contributor Author

jdries commented Feb 26, 2019

Perhaps it would also help to provide a less-trivial example. For instance, this EVI index:
https://github.com/sentinel-hub/custom-scripts/blob/master/sentinel-2/evi/script.js
I'm currently trying to compute it using a single reduce operation, assuming that my datacube contains the necessary bands.
Like how will we define a multiplication with a constant: (6.0 * B04)
Creating a new band based on this constant would work, but is a lot of overhead.
Or can we do:

{
 'process_id':'product',
 'arguments' : {
    'data':[{'from_node':'band_select_node'},6.0]
}
}

@jdries
Copy link
Contributor Author

jdries commented Feb 27, 2019

@edzer @mkadunc Anyone have an idea on how to do a formula like this:
2.5 * (NIR - RED) / ((NIR + 6*RED - 7.5*BLUE) + 1)
I'm especially stuck on how to handle the constants with the current processes.

@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2019

Yes, we can do that. But indeed it seems like we should have a shortcut process to access the bands. filter_bands + array_element is quite cumbersome. Would that make things better?

Back to your example, the formula for reference: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + (-7.5)*BLUE) (slightly restructured for convenience).
In the following example I assume that the Bands are simply called NIR, RED, and BLUE. So I guess it would be something like the following process graph. At least that's how I thought initially how it would work, though I'm not quite sure yet whether that's how it works with the changed "understanding" of data cubes as my brain is still somehow always working with the old understanding...

{
  "getcol1": {
    "process_id": "load_collection",
    "arguments": {
      "id": "Sentinel-2"
    }
  },
  "filter1": {
    "process_id": "filter_bands",
    "arguments": {
      "data": {
        "from_node": "getcol1"
      },
      "bands": [
        "NIR",
        "RED",
        "BLUE"
      ]
    }
  },
  "reduce1": {
    "process_id": "reduce",
    "arguments": {
      "data": {
        "from_node": "filter1"
      },
      "dimension": "spectral",
      "reducer": {
        "callback": {
          "nirband": {
            "process_id": "array_element",
            "arguments": {
              "data": {
                "from_argument": "data"
              },
              "index": 0
            }
          },
          "redband": {
            "process_id": "array_element",
            "arguments": {
              "data": {
                "from_argument": "data"
              },
              "index": 1
            }
          },
          "blueband": {
            "process_id": "array_element",
            "arguments": {
              "data": {
                "from_argument": "data"
              },
              "index": 2
            }
          },
          "produc1": {
            "process_id": "product",
            "arguments": {
              "data": [
                6,
                {
                  "from_node": "redband"
                }
              ]
            }
          },
          "produc2": {
            "process_id": "product",
            "arguments": {
              "data": [
                -7.5,
                {
                  "from_node": "blueband"
                }
              ]
            }
          },
          "sum1": {
            "process_id": "sum",
            "arguments": {
              "data": [
                1,
                {
                  "from_node": "nirband"
                },
                {
                  "from_node": "produc1"
                },
                {
                  "from_node": "produc2"
                }
              ]
            }
          },
          "substr1": {
            "process_id": "substract",
            "arguments": {
              "data": [
                {
                  "from_node": "nirband"
                },
                {
                  "from_node": "redband"
                }
              ]
            }
          },
          "divide1": {
            "process_id": "divide",
            "arguments": {
              "data": [
                {
                  "from_node": "substr1"
                },
                {
                  "from_node": "sum1"
                }
              ]
            }
          },
          "produc3": {
            "process_id": "product",
            "arguments": {
              "data": [
                2.5,
                {
                  "from_node": "divide1"
                }
              ]
            },
            "result": true
          }
        }
      }
    }
  },
  "export1": {
    "process_id": "export",
    "arguments": {
      "data": {
        "from_node": "reduce1"
      },
      "format": "GTiff"
    },
    "result": true
  }
}

Corresponding JS client code (not a very nice implementation yet):
[Edit: Less verbose example some comments below...]

var b = new ProcessGraphBuilder();
var collection = b.process("load_collection", {id: "Sentinel-2"});
var filteredBands = b.process("filter_bands", {data: collection, bands: ["NIR", "RED", "BLUE"]});
var evi = b.process("reduce", {
	data: filteredBands,
	dimension: "spectral",
	reducer: (builder, params) => {
		var nir = builder.process("array_element", {data: params.data, index: 0});
		var red = builder.process("array_element", {data: params.data, index: 1});
		var blue = builder.process("array_element", {data: params.data, index: 2});
		var result = builder.process("product", {
			data: [
				2.5,
				builder.process("divide", {
					data: [
						builder.process("substract", {
							data: [nir, red]
						}),
						builder.process("sum", {
							data: [
								1, 
								nir,
								builder.process("product", {
									data: [6, red]
								}),
								builder.process("product", {
									data: [-7.5, blue]
								})
							]
						})
					]
				})
			]
		});
		return result;
	}
});
var result = b.process("export", {data: evi, format: 'GTiff'});
var createdProcessGraph = b.generate(result);

@jdries
Copy link
Contributor Author

jdries commented Feb 27, 2019

Thanks, this allows me to try this out in our client and backend implementation!

@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2019

I added the JS client code above. Does this "solution" actually makes sense to you, @jdries ? I'm sometimes still a bit confused with my old and the "new" understanding of everything...

@m-mohr m-mohr changed the title Document use of 'dimension_data' in process graphs Old and new process graph examples Feb 27, 2019
@m-mohr m-mohr changed the title Old and new process graph examples Old and new process graph examples (reduce, band math, ...) Feb 27, 2019
@jdries
Copy link
Contributor Author

jdries commented Feb 27, 2019

I guess it's basically the only possible solution with the current set of processes, which is better than having multiple options for the same problem. The fact that the data array can contain both primitive and complex objects is probably also not that explicitly described in the docs?
I do hope to be able to solve this with less verbose end-user code, having a shortcut for band selection could certainly help with that.

@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2019

The fact that the data array can contain both primitive and complex objects is probably also not that explicitly described in the docs?

This example doesn't contain complex objects (it's an array with three primitive values) and currently it is also not meant that the data array contains complex objects. How do you came to the conclusion that it is possible?

I do hope to be able to solve this with less verbose end-user code, having a shortcut for band selection could certainly help with that.

I fully agree! What would be a good solution for it? We just added dimension types (one is bands) so we could potentially introduce a special handling for bands (or more general: for "labeled" data)?! Another option is to solve this somehow client-side, e.g. translating array access to the array_element process so that it would simply be params.data[0] for nir instead of builder.process("array_element", {data: params.data, index: 0}).

@m-mohr m-mohr changed the title Old and new process graph examples (reduce, band math, ...) Old and new process graph examples (reduce, band math, labeled data) Feb 27, 2019
@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2019

Still, I'd like to say that the JS client code above is just a very naive implementation as a proof-of-concept. The final product will look less verbose so that users will be able to write code like this:

function eviReducer(b, data) {
	var nir = data.at(0);
	var red = data.at(1);
	var blue = data.at(2);
	return b.product([
		2.5,
		b.divide([
			b.substract([nir, red]),
			b.sum([
				1,
				nir,
				b.product([6, red]),
				b.product([-7.5, blue])
			])
		])
	]);
}

var builder = new ProcessGraphBuilder();
var collection = builder.load_collection("Sentinel-2");
var filteredBands = builder.filter_bands(collection, ["NIR", "RED", "BLUE"]);
var evi = builder.reduce(filteredBands, "spectral", eviReducer);
var result = builder.export(evi, "GTiff");
var createdProcessGraph = builder.generate(result);

That is possible with the 0.4 API and looks much better than my original example, I think.
Of course, the mathematical expressions look still a bit messy.

@m-mohr m-mohr transferred this issue from Open-EO/openeo-api Feb 27, 2019
m-mohr added a commit to Open-EO/openeo-js-commons that referenced this issue Feb 27, 2019
@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2019

Also, I changed the examples in the documentation to a full and up-to-date EVI example, see https://open-eo.github.io/openeo-api/v/0.4.0/processgraphs/#example and also the openAPI spec.
@jdries Maybe use that example for testing, the one I posted above had some minor issues.

@m-mohr m-mohr added help wanted Extra attention is needed question Further information is requested critical labels Feb 27, 2019
@m-mohr m-mohr added this to the v0.4 milestone Feb 27, 2019
@m-mohr m-mohr modified the milestones: v0.4, v0.5 Feb 28, 2019
@m-mohr
Copy link
Member

m-mohr commented Feb 28, 2019

Moved to a later release, I think we are fine with this for a first version. Otherwise we would potentially need to change quite a lot, which we don't have the time for. So let's make experiences with this and evolve it in the next versions.

@m-mohr m-mohr removed critical question Further information is requested labels Feb 28, 2019
@jdries
Copy link
Contributor Author

jdries commented Mar 1, 2019

I now support the EVI example in the python client:
https://github.com/Open-EO/openeo-python-client/blob/274aac31a04b329eedaceb94b104d162579b2df3/tests/test_bandmath.py#L45
It generates this result:
https://github.com/Open-EO/openeo-python-client/blob/274aac31a04b329eedaceb94b104d162579b2df3/tests/evi_graph.json

On the Python side, I was able to make it look very simple. Of course, this relies on the assumption that the backend supports this style of process graph.
The resulting graph is fairly long compared to the input, this also made it hard to debug while developing it. Also on the backend side, I would be fairly lost if I received a 150 line process graph. Some minor optimizations could help in this respect.
Once this functionality in the python client is documented, we may also want to reference it from the process description docs.

@m-mohr
Copy link
Member

m-mohr commented Mar 1, 2019

Oh, that looks awesome in the Python client, but I'm concerned that it bases on too many assumptions and doesn't work generally. I'm interested to learn how that works.

The process graphs are not meant to be consumed by users so I don't see a general problem here. Though, we have the process graph to model converter from the web editor, which we could release as a single app so that you could simply convert a process graph to a human-readable model (and vice-versa). I think this could help with debugging.

Which optimizations?

@m-mohr
Copy link
Member

m-mohr commented Mar 1, 2019

@jdries I just had a look at the process graph, it doesn't have a filter_bands to order the bands, so array_element won't work as expected. You would probably work with the wrong bands.

It would work without filer_bands if we specify that the bands are by default ordered as specified in the STAC metadata. Then you could simply pass the corresponding array index to array_element. Nevertheless, I'm currently not sure where to define such global definitions.

By the way: You don't need to generate "result": false in each node. That's the default behavior.

@jdries
Copy link
Contributor Author

jdries commented Mar 1, 2019

Thanks!
It uses the band metadata from the backend to determine band order. It does assume that the order of the bands in the metadata corresponds to the actual order. Another option would be to have a callback function that retrieves a band by name, instead of array index. It would help a lot for readability if this could be included inline.
In general, it assumes that a reduce on 'spectral_bands' is supported by the backend, and that basic math functions are available. Do you see any other assumptions?
The result:false is indeed a bit of a side effect of my current implementation.

As you know, I also care a lot about developer-friendliness, as an open source project always relies on a healthy and sustainable community of developers. The whole 'my unreadable format is not meant for humans' argument is an OGC classic, and would mean we don't need something like STAC, because we already have Inspire Metadata.
I believe allowing inline declarations could help to some extent, only problem is that the client would also have to be smart enough to construct a better process graph. The other option is simply a dedicated band math function, that parses expressions on the server side.

@m-mohr
Copy link
Member

m-mohr commented Mar 1, 2019

It uses the band metadata from the backend to determine band order. It does assume that the order of the bands in the metadata corresponds to the actual order.

I'd assume the same, but we have not specified it anywhere and I'm not quite sure where to specify it. Maybe load_collection (+ load_results and ...?) would be the appropriate place?

Another option would be to have a callback function that retrieves a band by name, instead of array index. It would help a lot for readability if this could be included inline.

It's not only the bands, but generally labeled data such as temporal or vertical dimensions, e.g. grouped levels or whatever "nominal" data. But in this version we don't have it yet, so I guess we should aim for that after the first version in May.

In general, it assumes that a reduce on 'spectral_bands' is supported by the backend, and that basic math functions are available. Do you see any other assumptions?

Fair enough to expect the processes to be available. The band order is an assumption and maybe more, but I still need to find the actual implementation.

As you know, I also care a lot about developer-friendliness

I'd say: me, too. But maybe I have other priorities here (client users) ;-)

The whole 'my unreadable format is not meant for humans' argument is an OGC classic, and would mean we don't need something like STAC, because we already have Inspire Metadata. [...] The other option is simply a dedicated band math function, that parses expressions on the server side.

Primarily, STAC is not meant to be human-readable either, so I don't buy that argument. STAC is much simpler than INSPIRE in many regards and who wants to mix XML and JSON? The problem with our process graphs is that we are working with JSON here and that always get's quite verbose. If we'd really want to have it human-readable we should remove that constraint and make something like the WCPS language.

Having a band math process would make this use case simpler, but not all the of the others. It would mean we need convenience processes for everything. So why not specify an EVI function similar to the NDVI function? Would make it simpler, too. (I'm not actually mandating here to do so.)

I believe allowing inline declarations could help to some extent, only problem is that the client would also have to be smart enough to construct a better process graph.

Could you give an example? Not quite sure yet what inline declarations are for you.

@jdries
Copy link
Contributor Author

jdries commented Mar 1, 2019

We can indeed clarify in load_collection that the band order corresponds to the metadata, seems like a good place.
As for the code, you can find it here:
https://github.com/Open-EO/openeo-python-client/blob/274aac31a04b329eedaceb94b104d162579b2df3/openeo/rest/imagecollectionclient.py#L197
and
https://github.com/Open-EO/openeo-python-client/blob/274aac31a04b329eedaceb94b104d162579b2df3/openeo/rest/imagecollectionclient.py#L213
I'm not too happy with the readability, because there's quite a few cases to take into account.
I also discovered a big assumption in a TODO: it assumes that you are combining bands that are already part of the same data cube. So you can not yet add two bands with an entirely different lineage, although I believe that might even be possible.

As for the rest of the discussion, it's probably bringing us too far for 0.4.0, so let's get that out of the door first, and hope that other client/backend developers will join in with their experience.

@m-mohr
Copy link
Member

m-mohr commented Mar 1, 2019

Thanks. Some Python magic going on with operator overloading and so on, it seems?!

We can indeed clarify in load_collection that the band order corresponds to the metadata, seems like a good place.

I'll do that:

The bands (and all dimensions that specify nominal dimension values) are expected to be ordered as specified in the metadata.

I also discovered a big assumption in a TODO: it assumes that you are combining bands that are already part of the same data cube. So you can not yet add two bands with an entirely different lineage, although I believe that might even be possible.

I don't think that is allowed by the processes anyway, you'd need to resample and merge beforehand and then you can use all bands (in case the names don't collide - should we have a process to rename dimension values? See #50).

As for the rest of the discussion, it's probably bringing us too far for 0.4.0, so let's get that out of the door first

Would like to, waiting for #44 to be solved ...

m-mohr added a commit that referenced this issue Mar 1, 2019
@m-mohr
Copy link
Member

m-mohr commented Dec 13, 2019

Is this solved? I don't have a clear to do for now...

@m-mohr m-mohr closed this as completed Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants