Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tileId pattern matches unwanted tiles #32

Closed
EmileSonneveld opened this issue Jan 8, 2024 · 9 comments · Fixed by #38, Open-EO/openeo-python-driver#282 or Open-EO/openeo-geopyspark-integrationtests#19
Assignees

Comments

@EmileSonneveld
Copy link
Contributor

EmileSonneveld commented Jan 8, 2024

Because we can only use asterisk wildcards on metadata we can get too many tiles passing the filter.
As Marcel says: 26S*H will not only find the needed 26SKH and 26SLH tiles, but also the unneeded 26SHx

We could implement the "?" wildcard to match a single character. For example: 26S?H.
Or maybe allow for more complex matching by using something like filter_labels on product metadata.

Related: #25

@mbuchhorn
Copy link

next to adaptations of the wildcards or properties filters, it would be maybe good that the load_collection process can also handle a list of given tiles.
e.g. sometimes the AOI is between two UTMzones but still only two Sentinel-2 tileIDs would be needed. So instead of a tileId with a wildcard it would be better to just pass a list of the specific tile names.

@JeroenVerstraelen
Copy link
Collaborator

Would be good if this also works on CDSE

@JeroenVerstraelen
Copy link
Collaborator

20x20 processing box, for each we have maximum 4 tile ids (most of the time 1 or 2 tiles). Would be better if we can give it a list of tileids instead of a wildstar.

@mbuchhorn
Copy link

you can find an example for the 20x20km processing boxes with their corresponding Sentinel-2 tileID's here: https://git.vito.be/projects/NCA/repos/extentmapping/browse/src/extentmapping/resources/LAEA-20km_add-info.gpkg

I will implement this extra filter in the properties part of here: https://git.vito.be/projects/NCA/repos/extentmapping/browse/src/extentmapping/openeo/preprocessing.py#288

@jdries
Copy link
Contributor

jdries commented Apr 27, 2024

The key point here is to find an openEO compatible way to specify the property filter.

lambda p: array_contains(data=["26SKH","26SLH"],value = p)

Note that wildcard matching is not part of the spec, but we can do things like:

lambda p: text_begins(data=p,pattern = "26S")

@bossie
Copy link
Contributor

bossie commented May 15, 2024

This should work on Terrascope:

properties = {"tileId": lambda tile_id: array_contains(["31UES", "31UFS"], tile_id)}

data_cube = (connection
             .load_collection("SENTINEL2_L2A", properties=properties)
             .filter_bbox(west=4.4158740490713804, south=51.4204485519121945, east=4.4613941769140322, north=51.4639210615473885)
             .filter_temporal(["2024-04-24", "2024-04-25"])
             .filter_bands(["B04", "B03", "B02"])
             .save_result("GTiff"))

data_cube.execute_batch()

Adding a bbox filter is recommended because filtering by tile ID happens client side.

@bossie
Copy link
Contributor

bossie commented May 15, 2024

There's still some technical debt to be addressed.

  1. the array_contains process is translated to an {"eq": ["31UES", "31UFS"]} criterion;
  2. the criterion's operator "eq" is then ditched (not passed on to Scala);
  3. the array in the criterion's value (["31UES", "31UFS"]) behaves as an OR (tileId == "31UES" || tileId == "31UFS")

This is all very implicit and a better way would be to retain the operator i.e. an "in".

@bossie
Copy link
Contributor

bossie commented May 16, 2024

Some notes re: the technical debt:

  • org.openeo.geotrellissentinelhub.PyramidFactory does support operators in addition to "eq" i.e. "lte" to be able to support filtering by cloud cover (https://jira.vito.be/browse/EP-4102);
  • this difference is apparent in the signature of datacube_seq: metadata_properties: util.Map[String, util.Map[String, Any]] with an operator as opposed to metadata_properties: util.Map[String, Any] without;
  • translating the former to the latter is the responsibility of the flatten_eqs parameter:

https://github.com/Open-EO/openeo-geopyspark-driver/blob/630439c6170c9ec5668e8d815b9fe0818287f430/openeogeotrellis/layercatalog.py#L295-L310

bossie pushed a commit to Open-EO/openeo-geopyspark-integrationtests that referenced this issue May 16, 2024
@JeroenVerstraelen
Copy link
Collaborator

@mbuchhorn Your usecase should work now.

bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 28, 2024
bossie added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue May 29, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment