Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector data cubes + processes #68

Closed
m-mohr opened this issue Aug 16, 2019 · 7 comments · Fixed by #382
Closed

Vector data cubes + processes #68

m-mohr opened this issue Aug 16, 2019 · 7 comments · Fixed by #382

Comments

@m-mohr
Copy link
Member

m-mohr commented Aug 16, 2019

We basically ignored vector data cubes and related processes until now and will need to add more related processes in 1.0, which will be a major work package!

We also need to update existing processes, which currently only support raster-cubes and add vector-cube support to processes that currently only allow geojson.

@m-mohr m-mohr added help wanted Extra attention is needed new process labels Aug 16, 2019
@m-mohr m-mohr added this to the v1.0 milestone Aug 16, 2019
@m-mohr m-mohr modified the milestones: v1.0, future Sep 13, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Sep 13, 2019

Conclusion from 3rd year planning: Will not be tackled in the near future, we add vector related processes once required (e.g. filter_point for the Wageningen use case, see #37).

@jdries and @aljacob will explore further and may define additional processes in the future. Related is also #2

Currently, the processes always refer to "raster data cubes". Clarify (with @edzer?) whether it may be good to just call them data cubes and handle the "type" of cube internally.

@m-mohr
Copy link
Member Author

m-mohr commented Nov 26, 2019

Telco: Still not needed at the moment.

I'll need to go through the processes for 1.0 and check whether the vector-cubes as used at the moment make sense. Depends also on #2

m-mohr added a commit that referenced this issue Nov 26, 2019
@m-mohr m-mohr modified the milestones: future, v1.0 Nov 26, 2019
@m-mohr m-mohr removed the help wanted Extra attention is needed label Nov 26, 2019
m-mohr added a commit that referenced this issue Dec 2, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Dec 18, 2019

The recent comment from @mkadunc in #2 (comment) fits better here:

IMO the only thing we need to do in order to have vector-cubes support is allow objects as dimension labels (currently we only allow number, string, date, date-time and time). Then vector-cube is just a cube with simple-feature-geometry as dimension labels on the single spatial dimension. If we go for this approach, we already support vector cubes in all processes (but we treat the spatial dimension as nothing special).

We could also ignore vector-cubes altogether (for now), returning a raster cube with an ordinal dimension to encode the index of the corresponding polygon. This should be quite intuitive for the user...

What should we go for? At the moment, we have the two types raster-cube (in most processes) and vector-cubes (in a very limited set of processes), but don't explain at all what the latter is and how it works.

@m-mohr m-mohr added the help wanted Extra attention is needed label Dec 18, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Jan 20, 2020

We only define vector-cube as part of aggregate_polygon and save_result and handle them as "black box", so back-ends handle the transition. We'll dig into this again once it is needed.

@m-mohr m-mohr modified the milestones: v1.0-rc1, v1.0, future Jan 20, 2020
@m-mohr
Copy link
Member Author

m-mohr commented Jul 8, 2020

Recently, the question came up how to support vector-cubes as input data for processes. In this example it was aggregate_spatial that was required to be able to load more than GeoJSON. For reference my answer:

[...]

Fortunately, openEO is extensible and you can add whatever you need. The simplest option is to modify the "geometries" parameter to allow other things to be loaded.

To allow files it's relatively easy. Replace:

        {
            "name": "geometries",
            "description": "Geometries as GeoJSON on which the aggregation will be based.",
            "schema": {
                "type": "object",
                "subtype": "geojson"
            }
        },

with:

        {
            "name": "geometries",
            "description": "Geometries as GeoJSON on which the aggregation will be based.",
            "schema": [
                {
                    "type": "object",
                    "subtype": "geojson"
                },
                {
                    "type": "string",
                    "subtype": "file-path"
                }
            ]
        },

and then it also allows to specify files uploaded to the user workspace. Then it depends on your implementation what can be read. Input file formats should be exposed via GET /file_formats.

A bit more complex, but the way we'd standardize it later is probably to use load_uploaded_files. The issue here is that we haven't really thought about how vector-cubes would work, but you could change the return value in the load_uploaded_files process as follows:

    "returns": {
        "description": "A data cube for further processing.",
        "schema": [
            {
                "type": "object",
                "subtype": "raster-cube"
            },
            {
                "type": "object",
                "subtype": "vector-cube"
            }
        ]
    }

Now it supports loading vector data and returns it in a (virtual) vector data cube, which you can then accept in aggregate_spatial with the following definition for the geometries parameter:

        {
            "name": "geometries",
            "description": "Geometries as GeoJSON on which the aggregation will be based.",
            "schema": [
                {
                    "type": "object",
                    "subtype": "geojson"
                },
                {
                    "type": "object",
                    "subtype": "vector-cube"
                }
            ]
        },

Now, you need to figure out how to pass the data between the processes, but as there's not much more that can handle vector cubes yet, you can just do that how it works best internally.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 12, 2021

As far as I've understood, no use case in openEO Platform requires vector processes directly. There may be single processes required, like aggregate_spatial that can be considered on a case-by-case basis. Nevertheless, it is listed as a separate requirement in the SoW.

@edzer
Copy link
Member

edzer commented Dec 16, 2021

See also the confusion arising at #308.

A vector data cube is an n-D cube where (at least) one of the dimensions is associated with vector geometries (points, lines, polygons, or their multi-version). Example figures for 3D cubes:

Special, lower-dimensional cases:

  • one-D: if we have a single attribute associated with a set of geometries
  • two-D: if we have a single set of (uniform: single type) attributes associated with geometries, e.g. NDVI for different times, or different spectral bands for a single moment in time.

A difficulty of this concept is that our vector data file formats we usually work with (those read/written by GDAL: from shapefile to geopackage to GeoJSON to geodatabase) only can cover the two-D case; we need to juggle with the third dimension to use such formats ("flatten" the cube somehow: either in wide form over the attribute space, or in long form by repeating the geometries). A format that can (properly) handle vector data cubes is NetCDF, e.g. this is an example of a multipolygon x time datacube in NetCDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants