Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which quantile type? #296

Closed
m-mohr opened this issue Oct 27, 2021 · 8 comments · Fixed by #303
Closed

Which quantile type? #296

m-mohr opened this issue Oct 27, 2021 · 8 comments · Fixed by #303
Assignees
Labels
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Oct 27, 2021

Related: while looking up example definitions and descriptions I stumbled on this:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html referencing https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf

there are nine definitions of sample quantiles that commonly appear in statistical software packages.

If we want consistency across backends, we probably have to pick which flavor of "sample quantiles" we want.

at the moment I have no idea which flavor that's currently used in VITO/EODC backends for example

Originally posted by @soxofaan in #294 (comment)

@m-mohr
Copy link
Member Author

m-mohr commented Oct 27, 2021

We need to asses which types are implemented by the back-ends, thus assigning a dev of each back-end to look into it and report back here.

@ValentinaHutter
Copy link

At EODC the xarray.DataArray.quantile function is used in the implementation of the process. It can be found here: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.quantile.html Five interpolation methods are provided there, we are currently using the default method 'linear' which corresponds to

For many methods, a fractional quantity is used to determine an interpolation parameter, λ. For the previous example, the fraction quantity is (Np - j) = (6.4 - 6) = 0.4. If you use λ = 0.4, then an estimate the 64th percentile would be the value 40% of the way between x[6] and x[7].

@mmacata mmacata assigned metzm and unassigned mmacata Oct 28, 2021
@metzm
Copy link

metzm commented Oct 28, 2021

The openEO GRASS back-end, i.e. GRASS GIS, also uses the 'linear' method exactly as EODC with the xarray.DataArray.quantile function, with the only difference that in GRASS GIS it is implemented in C.

@dthiex
Copy link
Contributor

dthiex commented Nov 2, 2021

The Sentinel Hub Statistical API is not yet connected with openEOP but there we use the 'higher' method as described on the numpy.percentile function.

Maybe we can use by default the same interpolation method on all backends but if backends support multiple interpolation methods allow this to be set as an optional parameter?

@m-mohr
Copy link
Member Author

m-mohr commented Nov 4, 2021

Thanks all, I'll take this as a baseline for a clarification/improvement. Still interested in the EURAC implementation(s), of course. @clausmichele @aljacob

@m-mohr
Copy link
Member Author

m-mohr commented Nov 16, 2021

@edzer You probably have the most experience with statistics, is there any type that you'd think is the most reasonable (default) choice in our openEO context? Or does it not matter so much? It seems the most widely implemented is 7, type 8 is recommended by Hyndman + Fan (1996).

Edit: Surveying the defaults from other environments right now, Python seems pretty set on what they call linear (type 7), R seems to default to type 7, too, but is configurable. It could indeed be good to give a choice here, but leaving it open for later. The R documentation gives a hint that different domains prefer different types, e.g. hydrology is using type 5. It seems all libraries used in openEO give a choice, except for GRASS.
This is quite good survey: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample - R‑7 seems like what most can do and support.

@m-mohr m-mohr added this to the 1.2.0 milestone Nov 16, 2021
m-mohr added a commit that referenced this issue Nov 16, 2021
@m-mohr m-mohr linked a pull request Nov 16, 2021 that will close this issue
@m-mohr
Copy link
Member Author

m-mohr commented Nov 16, 2021

PR #303 is up for review.

m-mohr added a commit that referenced this issue Dec 1, 2021
* Use type 7 for quantiles #296
@m-mohr m-mohr closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.