Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of reduce and apply_dimension #73

Closed
m-mohr opened this issue Sep 2, 2019 · 13 comments
Closed

Behavior of reduce and apply_dimension #73

m-mohr opened this issue Sep 2, 2019 · 13 comments
Assignees
Labels
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Sep 2, 2019

I realized that id you use apply_dimension, there are not many functions that can actually be used. The callback parameter has the data type array and the callback must return an array, but except some functions like cumsum, cummax etc. we don't have many functions to be used. One could maybe be interested in applying absolute, but would need to reduce to a single dimension first, then call apply and then merge the cubes again, which seems a bit too complicated.

@m-mohr m-mohr added help wanted Extra attention is needed waiting labels Sep 2, 2019
@m-mohr m-mohr added this to the v1.0 milestone Sep 2, 2019
@mkadunc
Copy link
Member

mkadunc commented Sep 2, 2019

What did you mean by "applying absolute"?

@m-mohr
Copy link
Member Author

m-mohr commented Sep 2, 2019

@claxn created for GEE a process graph with apply_dimension and a callback, which just was the absolute function. See https://github.com/Open-EO/openeo-earthengine-driver/blob/master/docs/s1.json That is not valid as apply_dimensions callback parameter is an array and absolute takes a number as parameter.

Now I'm looking for a way out.

'applying absolute' was therefore meant as apply with a callback absolute, but just for a single dimension it seems. But now that I'm thinking more about it, I'm not sure what @claxn actually was trying to do. Could you clarify, Claudio? It looks like it should be apply instead maybe?

@m-mohr m-mohr added the question Further information is requested label Sep 2, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Sep 3, 2019

Discussed during a call with @claxn. The S1 example war wrong. The intended process was apply, not apply_dimension.

@m-mohr m-mohr closed this as completed Sep 3, 2019
@mkadunc
Copy link
Member

mkadunc commented Sep 3, 2019

As for use-cases that need to operate on 1D-array partitions of the data cube, these are some off the top of my head:

  • smoothing along a temporal axis (some types of smoothings can be done with apply_kernel, but not all smoothings are representable in this way)
  • gap-filling no-data values in a (time) series from other available data
  • computing (temporal) derivative of values (again, often representable with apply_kernel)
  • sorting values (e.g. to produce a cube ordered by pixel quality instead of time)
  • single-pixels atmospheric correction (fit a model to all bands of an image, then convert TOA to BOA reflectance)

@m-mohr m-mohr reopened this Sep 3, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Sep 3, 2019

That are good use cases, we should check and try out. Maybe we need changes to the processes to make them work... Can anyone come up with example process graphs? I probably won't have time for it until late this year...

@lforesta
Copy link
Contributor

So on the one hand the issue is that we don't have many processes which can be used as callbacks from apply_dimension, and on the other hand we should try and create process graphs for the examples proposed by @mkadunc , right?

About the first point, more processes will come out as we focuse more on the defined use cases and other applications.
About the second, I don't have PGs yet, but one comment for the (time) derivative example. Since the cardinality of the dimension is reduced by one, strictly speaking we can't use apply_dimension, no? apply_kernel as proposed is more flexible for this example.

@mkadunc
Copy link
Member

mkadunc commented Sep 20, 2019

Re derivatives: cardinality gets reduced by one if derivative is computed by using a 2-point derivative on all pairs of values, but the same process would also effectively move the positions of samples on the time axis to the mid-point between the two values (at least that's where the 2-point derivative will be the most accurate).

Usually, the user would desire to keep the cardinality and positions along the time axis the same, which can be achieved:

  • with a symmetric central difference, using a kernel but extending the border — BTW (@m-mohr): The apply_kernel process description says that "resolution, cardinality and the number of dimensions are the same as for the original data cube"
  • with forward/backward difference formulae on the first and last values, and symmetric difference for 'interior' values
  • by fitting a model to the values (e.g. a linear combination of seasonal sine waves) and then replacing the measurements with the value of the fitted function at the same timestamps

@lforesta
Copy link
Contributor

@mkadunc I agree as long as any "extra" operation is declared in the process description.
In general it seems derivatives are mostly covered by apply_kernel (an exception is your third example) as your already pointed out earlier.

@m-mohr
Copy link
Member Author

m-mohr commented Oct 11, 2019

  • The apply_kernel process description says that "resolution, cardinality and the number of dimensions are the same as for the original data cube"

Yes, is there an issue with that? If we want to use apply_kernel for more things, we may need to change it but until now I don't really understand what you actually want to change...

@mkadunc
Copy link
Member

mkadunc commented Oct 28, 2019

The apply_kernel process description says that "resolution, cardinality and the number of dimensions are the same as for the original data cube"

Yes, is there an issue with that? If we want to use apply_kernel for more things, we may need to change it but until now I don't really understand what you actually want to change...

The issue is that kernels will inevitably reduce cardinality unless the border is extended - we need to address this in the documentation somehow, e.g. either:

  1. drop the constraint that the cardinalilty stays the same, or
  2. define what the kernel yields when some part of the window is over the border (e.g. we could say that outside values are input as no_data), or
  3. define what the prescribed border-extension method is, or
  4. define how the user specifies their required border extension method

The data cubes that we're (virtually) dealing with are so huge that the border effect will mostly be negligible, so I'd go with 2 - the part of the window that is outside the (virtual) data cube will have no_data as its input.

@m-mohr m-mohr modified the milestones: v1.0, future Nov 26, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Nov 26, 2019

So the point here is that the new reducer as proposed in #89 pretty much makes apply_dimension obsolete, but applying a sort in a reduce function is not very logical.

We discussed during the telco whether

  1. to remove apply_dimension and make reduce even more universal (+ a name change for reduce) or
  2. to limit the functionality of reduce (don't allow multiple return values / remove target_dimension) and make apply_dimension more universal (+ a name change for apply_dimension, could be renamed to apply_along_dimension).

The preffered alternative by most was to go for two separate functions. I'll issue a PR and let you all review.

@m-mohr m-mohr added accepted and removed help wanted Extra attention is needed waiting question Further information is requested labels Nov 26, 2019
@m-mohr m-mohr modified the milestones: future, v1.0 Nov 26, 2019
@m-mohr m-mohr changed the title apply_dimension callbacks Behavior of reduce and apply_dimension Nov 26, 2019
@m-mohr m-mohr added help wanted Extra attention is needed accepted and removed accepted labels Nov 26, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Dec 13, 2019

Implementing point 2 also better aligns with aggregate_polygon, aggregate_temporal, resample_cube_temporal and merge_cubes, which all define reducers as processes returning a single value. So extrema, quantiles etc should be used with apply_dimension and thus target_dimension will be moved to, making reduce simpler and might even help with (avoiding?) #94.

@m-mohr m-mohr self-assigned this Dec 13, 2019
@m-mohr m-mohr removed the help wanted Extra attention is needed label Dec 13, 2019
m-mohr added a commit that referenced this issue Dec 18, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Dec 18, 2019

There's a PR up for discussions: #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants