Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

order: clarify ordering of ties #409

Closed
LukeWeidenwalker opened this issue Feb 27, 2023 · 14 comments
Closed

order: clarify ordering of ties #409

LukeWeidenwalker opened this issue Feb 27, 2023 · 14 comments
Assignees
Labels
Milestone

Comments

@LukeWeidenwalker
Copy link
Contributor

LukeWeidenwalker commented Feb 27, 2023

Process ID: order

Describe the issue:
The simplest implementation to support the asc parameter is to simply flip the input array before running argsort if descending order is desired. However, the order process currently requires that Ties will be left in their original ordering, regardless of whether asc=True or asc=False. This means that we cannot simply flip the input array array with the ordered indices, because ties would be flipped too. This makes the implementation more unwieldy than it has to be.

Proposed solution:
Change this requirement such that ties will be in the original ordering if asc=True and flipped otherwise.

Additional context:
See here for initial discussion

@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2023

@soxofaan @dthiex Do you know how your implementations handle this?

@soxofaan
Copy link
Member

soxofaan commented Feb 27, 2023

we currently use this implementation from the legacy openeo-processes-python repo: https://github.com/Open-EO/openeo-processes-python/blob/f57c663e8708061a8222851a8aec2762ac46b0bb/src/openeo_processes/arrays.py#L745-L817

which behaves like the current spec if I understand correctly:

import numpy as np

data = np.array([1, 2, 2, 3])
print(data)
# ---> [1 2 2 3]

# Ascending
print(numpy.argsort(data, kind="mergesort"))
# ---> [0 1 2 3]

# Descending
print(numpy.argsort(-data, kind="mergesort"))
# ---> [3 1 2 0]

@soxofaan
Copy link
Member

The simplest implementation to support the asc parameter is to simply flip the input array before running argsort if descending order is desired. However, the order process currently requires that Ties will be left in their original ordering, regardless of whether asc=True or asc=False. This means that we cannot simply flip the input array, because ties would be flipped too.

maybe I'm misunderstanding, but if you mean flipping the sign of the input data with "flipping the array", I think the descending mode works just fine with numpy.argsort as illustrated above

@LukeWeidenwalker
Copy link
Contributor Author

LukeWeidenwalker commented Feb 27, 2023

Apologies, typo in my description:
I mean after applying argsort, flipping (aka reversing) the array with the ordered indices with np.flip to get the opposite ordering.

Flipping the sign only works on numerical arrays, but will fail on string/datetime arrays.

@m-mohr
Copy link
Member

m-mohr commented Feb 27, 2023

If it works with argsort in the old implementation, why do we need a change here? Can you give some examples, @LukeWeidenwalker?

@LukeWeidenwalker
Copy link
Contributor Author

LukeWeidenwalker commented Feb 27, 2023

Adding the minus-sign like in the old implementation only works on numerical arrays:
-data

You cannot do this on e.g. string arrays, it will error out. For those other arrays, you can still get the sorting indices with argsort and flip that array if you want it descending. However, then the order of ties is flipped too - hence this issue!

Depending on your answer to this comment: Other than changing this in the spec, we can of course just not support anything other than numerical arrays in our order process and throw descriptive errors on why this isn't possible. Don't think that's the best option for users though, because that will only be reportable at runtime.

@m-mohr
Copy link
Member

m-mohr commented Feb 28, 2023

Order allows number|null|date-time:string|date:string|time:string - All the strings are temporal, so I'd assume internally they could be timestamps (i.e. numerical)? Can you use the - thing on some kind of date/time objects in Python?

Alternatively, just remove the strings from the schema (or whatever you don't support). This is reasonable, I think.

@dthiex
Copy link
Contributor

dthiex commented Feb 28, 2023

Do you know how your implementations handle this?

In our backend we only support number, null or a valid ISO date string as inputs. We then use the javascript sort function which handles ties as described in the process documentation (= Stefaan's example).

@LukeWeidenwalker
Copy link
Contributor Author

Order allows number|null|date-time:string|date:string|time:string - All the strings are temporal, so I'd assume internally they could be timestamps (i.e. numerical)? Can you use the - thing on some kind of date/time objects in Python?

No, just tried again, this doesn't work!

we currently use this implementation from the legacy openeo-processes-python repo

@soxofaan so how does your process handle datetime strings then?

@soxofaan
Copy link
Member

soxofaan commented Mar 2, 2023

with strings we don't do date interpretation, but that's usually not a problem with ISO8601

A string array is eventually passed to np.argsort and:

  • asc=True (default) mode works
  • but asc=False currently fails:
[500] Internal: Server error: TypeError("bad operand type for unary -: 'str'")

A mixed array (strings and numbers) also fails in ascending mode with:

[500] Internal: Server error: TypeError("'<' not supported between instances of 'str' and 'int'")

I don't think we ever had someone complaining about this and requiring proper string support for the order process, so it's not high on our priority list currently

@m-mohr
Copy link
Member

m-mohr commented Mar 2, 2023

It seems it's hard to really come to a conclusion here with the variability of implementations.
There are a couple of options:

  1. Implementation: Throw an error if not supported
  2. Implementation: Remove string-types from the JSON Schema
  3. Implementation: Remove the ask parameter from the JSON Schema
  4. Specification: Drop the "asc" parameter in "order" and add a "array_reverse" process in 2.0. I assume the ties issue is not really a thing in "sort"?
  5. Specification: Clarify that "order" always sorts in ascending order and the reverse is only applied afterwards if the "asc" parameter is set to false. This is likely the most interoperable (but slowest) solution as it's a two-step approach that I assume everyone could follow.

The simpliest part would probably be 5, but this would probably lead to a change in SH, right?

with strings we don't do date interpretation, but that's usually not a problem with ISO8601

@soxofaan fyi: The spec says: "Temporal strings can not be compared based on their string representation due to the time zone/time-offset representations."

@soxofaan
Copy link
Member

soxofaan commented Mar 2, 2023

with strings we don't do date interpretation, but that's usually not a problem with ISO8601
@soxofaan fyi: The spec says: "Temporal strings can not be compared based on their string representation due to the time zone/time-offset representations."

yes I know, I just mean that is practically not a problem in VITO backend because we usually only work at date resolution (and if there is a time component we'd always use UTC)

@m-mohr
Copy link
Member

m-mohr commented Mar 14, 2023

Proposal:
Replace "Ties will be left in their original ordering." with "The ordering of ties is implementation-dependent."

m-mohr added a commit that referenced this issue Mar 14, 2023
@m-mohr
Copy link
Member

m-mohr commented Mar 14, 2023

Changed in dcd3b8a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants