merge_cubes: Clarify merging procedure #87

jdries · 2019-11-14T12:37:07Z

In our use case, we want to run two separate operations on a given datacube, and then combine those two results.
To do this, we either need a reducer that can reduce values coming from different cubes, or we need a process that 'joins' the cubes first to a datacube with two bands. This is probably the most generic solution and needed anyway.

m-mohr · 2019-11-14T12:38:32Z

There's merge_cubes, which has been defined at the VITO sprint. If that's not working properly, we should probably improve that process instead of adding a new one.

jdries · 2019-11-14T12:44:41Z

Indeed, I found that one as well. Perhaps we should then clarify how it handles bands?
Or is there a reducer that basically leaves the bands untouched? Like
merge([first_cube_band],[second_cube_band]) = [first_cube_band,second_cube_band]

m-mohr · 2019-11-14T13:56:56Z

My understanding of the process is that if you pass a two datacube with dimensions x,y,t,b that one has bands a,b and the other has bands c,d the result will be a datacube x,y,t,b, with bands a,b,c,d. Same applies for t etc. If there are two bands with the same name, the overlap_resolver must be used to reduce the values so that the values of both bands are reduced to a single band.

@mkadunc You were pushing that definition at the VITO sprint. Do you agree?

jdries · 2019-11-14T14:01:09Z

That would make sense, adding that to the documentation would solve the issue.

Open-EO/openeo-processes#87

mkadunc · 2019-11-14T15:58:24Z

Yes, that was my understanding.

A caveat for Jeroen's scenario - if the two input cubes do not yet have band dimension, merge should fail (merge(A(x,y,t), B(x,y,t)) won't work whenever any cell is present in both cubes, at least not without an overlap resolver) - to be able to use the merge process on such cubes, one should extend the cubes first:

    merge(
        add_dimension(A(x,y,t), "bands", "a"),
        add_dimension(B(x,y,t), "bands", "c")
    ) // returns cube with dimensions (x, y, t, bands)

m-mohr · 2019-11-14T16:03:10Z

@mkadunc Shouldn't merge(A(x,y,t), B(x,y,t)) still work? I would expect that the temporal dimension could similarly be merged. For example if A has values for 2017 and B for 2018, why should that fail? Seems to be a reasonable use case to me? Similarly for x and y, although resolving overlap in these cases could likely be problematic.

mkadunc · 2019-11-14T16:08:20Z

Shouldn't merge(A(x,y,t), B(x,y,t)) still work?

Yes, of course - in your example there are no overlapping cells, because the two cubes are separated in time.

I was commenting on Jeroen's scenario, in which the two cubes occupy the same space and time, and the introduction of an extra dimension makes them disjoint again.

m-mohr · 2019-11-14T16:11:28Z

Thanks, got it.

mkadunc · 2019-11-14T16:18:06Z

Also, existing merge_cubes, with a provided overlap_resolver might already be sufficient answer to Jeroen's original question - merge is a process "that can reduce values coming from different cubes", e.g. if the combining function is combineAB(a::number, b::number)

    cubeA = customProcessingA(collection/x,y,t/);
    cubeB = customProcessingB(collection/x,y,t/);
    cubeR/x,y,t/ = merge_cubes(
        cube1=cubeA,
        cube2=cubeB,
        overlap_resolver = callback({ return combineAB(data[0], data[1]) })
    );

m-mohr · 2019-11-14T16:21:12Z

Yes, indeed. That's basically what I meant in #87 (comment). I think the only ToDo here is to improve the documentation a bit.

jdries · 2019-11-15T08:10:33Z

I agree to the above.
There's also the case where the bands in the two cubes have the same name. I guess that would mean you need an overlap_resolver?
And related to that, is there a way to rename bands?

m-mohr · 2019-11-17T23:45:51Z

Yes, you'd need an overlap resolver.

No, there's no way to rename dimension values at the moment, but there's #50 to track it.

jdries · 2019-11-21T10:25:04Z

Perhaps related: what about merging vector cubes, either through spatial-join or attribute join.
Use a separate process for that, or extend this one?

m-mohr · 2019-11-21T10:55:22Z

@jdries At the moment vector cubes are not supported at all. In the current draft for the 1.0 processes there's no vector-cubes data type in any of the processes. There were some references to vector-cubes in 0.4, but I removed them as we basically had no real use for it yet and most processes could not handle them yet anyway. We need discuss a useful approach later. Related: #68

jdries · 2019-11-28T09:52:11Z

My first graph containing merge cubes is ready, as created by the python client:

I'm still a bit unsure about how to optimally specify the inputs to the callback. I now use a list of expressions, and point back to the cubes that are to be merged.
I do feel that this is a little bit awkward. Somehow it's pretty clear from docs what the overlaps_resolver should work on, but it seems that the API gives me different ways to specify it, which could lead to incompatibilities.

    "process_id": "merge_cubes",
    "arguments": {
      "cube1": {
        "from_node": "linearscalerange3"
      },
      "cube2": {
        "from_node": "linearscalerange4"
      },
      "overlap_resolver": {
        "callback": {
          "or1": {
            "process_id": "or",
            "arguments": {
              "expressions": [
                {
                  "from_argument": "cube2"
                },
                {
                  "from_argument": "cube2"
                }
              ]
            },
            "result": true
          }
        }
      }
    },

Full graph:
https://github.com/Open-EO/openeo-python-client/blob/master/tests/data/cube_merge_or.json

mkadunc · 2019-11-28T11:03:45Z

I think that the overlap resolver would be applied on each pixel within the overlap, so the callback would take pixels, not cubes as arguments. (from_argument can't really access arguments of the merge_cubes process, so you're not pointing back to those; you can only access the parameters that the callback will be called with by the merge_cubes process)

Pseudo-code implementation of merge_cubes should look something like:

merge_cubes(cube1, cube2, overlap_resolver) {
    axesRet = cube1.axes.map(ax1 => Axis.createUnion(ax1, cube2.axes[ax1.name]) )
    cubeRet = createCube(axesRet)

    foreach(pxIndex in cubeRet.pixelIndices) {
        const pxCoord = cubeRet.getCoordinates(pxIndex)

        const val1 = cube1.containsCoord(pxCoord) ? cube1.getValueAt(pxCoord) : null
        const val2 = cube2.containsCoord(pxCoord) ? cube2.getValueAt(pxCoord) : null
        if (val1 == null) {
            if (val2 != null) cubeRet.values[pxIndex] = val2
        } else if (val2 == null) {
            cubeRet.values[pxIndex] = val1
        } else {
            // callback is called with two values, not two cubes
            cubeRet.values[pxIndex] = overlap_resolver(val1, val2)
        }
    }

    return cubeRet
}

We've recently settled on using x and y as the common names for (mathematical) binary processes, so I'd change your graph into something like this:

{
    "process_id": "merge_cubes",
    "arguments": {
        "cube1": {"from_node": "linearscalerange3"},
        "cube2": {"from_node": "linearscalerange4"},
        "overlap_resolver": {"callback": {
            "_": {
                "process_id": "or",
                "arguments": {"expressions": [{"from_argument": "x"}, {"from_argument": "y"}] },
                "result": true
            }
        }}
    }
}

side-note: having callbacks with the same argument names as mathematical binary processes (x and y) will at some point allow us to simplify the specification of a callback to a simple pointer to a pre-defined process — e.g. the signature of the or process is or(x::number, y::number), which is compatible with overlap_resolver, if it is defined as function(x::number, y::number) - this allows specifying your graph as:

{
    "process_id": "merge_cubes",
    "arguments": {
        "cube1": {"from_node": "linearscalerange3"},
        "cube2": {"from_node": "linearscalerange4"},
        "overlap_resolver": {"process_id": "or"}
    }
}

(I'm not sure how well the possibility of having such callbacks is supported right now, but it should not be too difficult to implement support if we decide something like that would be valuable)

jdries · 2019-11-28T11:28:03Z

Thanks, agree with the pseudo code.
Actually your second example process graph would be my preferred way of supporting it right now in the Python client and our backend. It is simpler, so reduces opportunities for misinterpretation.
It's also the case that I'm not supporting more complex reducers right now for merge_cubes.

m-mohr · 2019-11-28T11:51:45Z

The merge_cubes overlap_resolver works exactly as the reduce function.

Here's the related documentation:

see: http://processes.openeo.org/#merge_cubes

If binary = false: An array with the overlapping pixels is passed in data.
If binary = true: Two values from the overlapping pixels are passed in x and y.

So your initial part of the process graph would look like this:

    "process_id": "merge_cubes",
    "arguments": {
      "cube1": {
        "from_node": "linearscalerange3"
      },
      "cube2": {
        "from_node": "linearscalerange4"
      },
      "overlap_resolver": {
        "callback": {
          "or1": {
            "process_id": "or",
            "arguments": {
              "expressions": {
                "from_argument": "data"
              }
            },
            "result": true
          }
        }
      }
    }

The binary case would basically be what Miha posted.

re @mkadunc's side-note: The implicit mapping of parameters as proposed is currently not possible. I'm not sure whether it's really useful to solve this on the process graph level. I guess clients could simply do that for the user whenever parameters are "compatible" and come up with an "explicit" process graph. I think that would be the cleaner solution.

@jdries Could you clarify where this confusion comes from? For me the docs are quite clear, but I'm biased. ;-) Would help me to improve the docs.

jdries · 2019-11-29T19:57:23Z

Thanks, guess it was mostly me not reading the process docs well enough. Only thing to make it even easier for lazy people is to add the examples from this issue to the process docs.

m-mohr · 2019-12-18T16:32:03Z

Clarified merge_cubes in PR #112

Clarified merge behavior of merge_cubes #87

clausmichele · 2020-07-31T12:29:34Z

I would like to understand if x and y of the overlap resolver refer to cube1 and cube2 (x -> cube1, y -> cube2), since it is not explicitly stated in the API. This doubt comes from some inconsistencies I got from testing with the GEE backend, which I'm not sure are due to the implementation or the definition of the process itself.

m-mohr changed the title ~~new process: join datacubes~~ merge_cubes: Clarify merging procedure Nov 14, 2019

m-mohr added this to the v1.0 milestone Nov 14, 2019

m-mohr added the documentation label Nov 14, 2019

jdries added a commit to Open-EO/openeo-python-client that referenced this issue Nov 14, 2019

EP-3145 support merge_cubes in python client

2744a21

Open-EO/openeo-processes#87

m-mohr added the accepted label Nov 14, 2019

mkadunc mentioned this issue Nov 28, 2019

Callbacks: Allow using process_id directly in places where callback is expected Open-EO/openeo-api#244

Closed

m-mohr added a commit that referenced this issue Dec 18, 2019

Clarified merge behavior of merge_cubes #87

30bfda6

m-mohr added the work in progress label Dec 18, 2019

m-mohr added a commit that referenced this issue Dec 19, 2019

Fixed merge_cubes issues #87

c5e298a

m-mohr added a commit that referenced this issue Dec 19, 2019

Improved merge_cubes based on feedback from @soxofann and @mkadunc. #87

226e437

m-mohr added a commit that referenced this issue Dec 19, 2019

Improved merge_cubes based on feedback from @soxofaan and @mkadunc. #87

615a087

m-mohr added a commit that referenced this issue Jan 13, 2020

Merge pull request #112 from Open-EO/issue-87

e03aa3b

Clarified merge behavior of merge_cubes #87

m-mohr closed this as completed Jan 13, 2020

m-mohr mentioned this issue Jul 31, 2020

Clarify x and y in merge_cubes #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge_cubes: Clarify merging procedure #87

merge_cubes: Clarify merging procedure #87

jdries commented Nov 14, 2019

m-mohr commented Nov 14, 2019

jdries commented Nov 14, 2019

m-mohr commented Nov 14, 2019 •

edited

Loading

jdries commented Nov 14, 2019

mkadunc commented Nov 14, 2019 •

edited

Loading

m-mohr commented Nov 14, 2019 •

edited

Loading

mkadunc commented Nov 14, 2019 •

edited

Loading

m-mohr commented Nov 14, 2019

mkadunc commented Nov 14, 2019

m-mohr commented Nov 14, 2019

jdries commented Nov 15, 2019

m-mohr commented Nov 17, 2019 •

edited

Loading

jdries commented Nov 21, 2019

m-mohr commented Nov 21, 2019

jdries commented Nov 28, 2019 •

edited

Loading

mkadunc commented Nov 28, 2019 •

edited

Loading

jdries commented Nov 28, 2019

m-mohr commented Nov 28, 2019 •

edited

Loading

jdries commented Nov 29, 2019

m-mohr commented Dec 18, 2019

clausmichele commented Jul 31, 2020

merge_cubes: Clarify merging procedure #87

merge_cubes: Clarify merging procedure #87

Comments

jdries commented Nov 14, 2019

m-mohr commented Nov 14, 2019

jdries commented Nov 14, 2019

m-mohr commented Nov 14, 2019 • edited Loading

jdries commented Nov 14, 2019

mkadunc commented Nov 14, 2019 • edited Loading

m-mohr commented Nov 14, 2019 • edited Loading

mkadunc commented Nov 14, 2019 • edited Loading

m-mohr commented Nov 14, 2019

mkadunc commented Nov 14, 2019

m-mohr commented Nov 14, 2019

jdries commented Nov 15, 2019

m-mohr commented Nov 17, 2019 • edited Loading

jdries commented Nov 21, 2019

m-mohr commented Nov 21, 2019

jdries commented Nov 28, 2019 • edited Loading

mkadunc commented Nov 28, 2019 • edited Loading

jdries commented Nov 28, 2019

m-mohr commented Nov 28, 2019 • edited Loading

jdries commented Nov 29, 2019

m-mohr commented Dec 18, 2019

clausmichele commented Jul 31, 2020

m-mohr commented Nov 14, 2019 •

edited

Loading

mkadunc commented Nov 14, 2019 •

edited

Loading

m-mohr commented Nov 14, 2019 •

edited

Loading

mkadunc commented Nov 14, 2019 •

edited

Loading

m-mohr commented Nov 17, 2019 •

edited

Loading

jdries commented Nov 28, 2019 •

edited

Loading

mkadunc commented Nov 28, 2019 •

edited

Loading

m-mohr commented Nov 28, 2019 •

edited

Loading