Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cos-lat averaging #125

Merged
merged 19 commits into from
Mar 10, 2023
Merged

cos-lat averaging #125

merged 19 commits into from
Mar 10, 2023

Conversation

RondeauG
Copy link
Contributor

@RondeauG RondeauG commented Dec 6, 2022

Pull Request Checklist:

  • pre-commit hooks are installed/active in my local clone ($ pre-commit install)
  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • Added cos-lat method to aggregate.spatial_mean()
  • Deprecated mean and added warnings to xesmf
  • Reworked the warning messages in _subset_file_coverage to be more explicit
  • Fixed a bug in compute_indicators when less than 3 timesteps were produced.

Does this PR introduce a breaking change?

  • No, but deprecates mean and changes the name of interp_coord to interp_centroid (still supports interp_coord with a deprecation warning)

Other information:

@RondeauG
Copy link
Contributor Author

RondeauG commented Dec 6, 2022

This is a partial fix to #94. Fixing xesmf.SpatialAverager is harder than expected and might require a PR in xESMF itself, but I didn't want to wait that long to implement cos-lat averaging.

Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I don't think the "cos-lat" think is compatible with curvilinear grids, but I'm not sure why it shouldn't be.

xscen/aggregate.py Outdated Show resolved Hide resolved
Comment on lines 131 to 142
if len(out.time) < 3:
freq = (
ind.injected_parameters["freq"]
if "freq" in ind.injected_parameters
else ind.parameters["freq"]["default"]
if "freq" in ind.parameters
else ind.src_freq
)
else:
freq = xr.infer_freq(out.time)
else:
freq = "fx"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think ii would be better to wrap this in a private function instead of having it twice?

Copy link
Collaborator

@mccrayc mccrayc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two key points for the cos-lat averaging:

  • No need to normalize the weights by their sum, I'm pretty sure xarray does this already.
  • In agreement with what Pascal said, I believe cos-lat weighting should work fine with 2D curvilinear grids with a couple minor modifications (see comments).

xscen/aggregate.py Outdated Show resolved Hide resolved
xscen/aggregate.py Outdated Show resolved Hide resolved
@aulemahal
Copy link
Collaborator

Je me reprends, cos-lat as implemented is indeed not usable with curvilinear grids. The thing is, cos(lat) is a weight that varies in latitude and thus assumes all cells have the same area in °² (degrees squared). Which is not true for rotated pole for example.

I found this blog : https://towardsdatascience.com/the-correct-way-to-average-the-globe-92ceecd172b7
It shows a method approximating the cell area with a lat-dependent earth radius. It however uses a cell area approximation that only works for non-curvilinear grids. You could use cf_xarray 0.7.6 to compute 2D bounds and then cell areas (in °).

@RondeauG your code was indeed correct. But I think curvilinear grid support is essential as one of the principal use case is with MRCC5 data.

@aulemahal
Copy link
Collaborator

@RondeauG I found the challenge interesting and spent a bit too much time on this, but here are 3 ways of computing cell areas for the weighted average:
https://gist.github.com/aulemahal/13c907cf2f7ccd3a84be34a7a009516c

We might not have pygeos already, but it's an optional dependency of clisops, so it might be worth adding it explicitly, it is much faster!

However, there's another issue with cf-xarray... Computing the 2D bounds yields strange results when the cell crosses the first lon meridian. In the MRCC5 case, the domain corner crosses the +180 / -180 line and cell areas are super large because the corners are wrong.

See:
image

# Conflicts:
#	HISTORY.rst
#	xscen/aggregate.py
@RondeauG
Copy link
Contributor Author

As a reference to compare to, I implemented the cos-lat method for 2D grids using ds.cf['latitude'] instead of ds.cf['X']. Here's an example for RegCM4 / NAM-22:

image

@RondeauG
Copy link
Contributor Author

I'll look into the links provided, though I feel like this might be another method entirely? As in, we'd have cos-lat or SpatialAverager or pygeos/ESMF/whatever?

@RondeauG
Copy link
Contributor Author

Also, with the changes, get_spatial_dims(ds) is pretty useless. I could restore it within method=="mean" as it was before, but I'm tempted to just get rid of mean entirely since it does the wrong thing to begin with.

@aulemahal
Copy link
Collaborator

aulemahal commented Dec 22, 2022

this might be another method entirely?

My "shapely" and "pygeos" functions are both based on the cos-lat approximation, so I would argue it's just a curvilinear-compatible version of cos-lat.

The ESMF version is indeed another technique because it computes the cell area using other approximations.

Note for shapely: the newest version (2.0) now merges the functionality of pygeos so there might be a way to code this idea without adding a new dependency.

@aulemahal
Copy link
Collaborator

I updated my gist with a function that works with shapely 2.0. It is indeed the exact same as with pygeos and it's as fast.

@RondeauG
Copy link
Contributor Author

RondeauG commented Mar 7, 2023

@aulemahal @mccrayc
This might be a lack of comprehension from my part on rotated grids, but let's say that I have this region:

image

The first part of Pascal's code (using the pygeos version for example) produces this:
xr.DataArray(pygeos.area(pygeos.polygons(pygeos.linearrings(ds.lon_bounds, ds.lat_bounds))), dims=ds.lon.dims, coords=ds.lon.coords).plot()

image

np.cos(np.deg2rad(ds.lat)) creates more or less the opposite:

image

Meaning that when you multiply both, you get:

image

...which makes no intuitive sense to me? I would have expected weights to be curved, not linear like that?

@aulemahal
Copy link
Collaborator

No I think this is ok! You are in a Oblique Mercator projection, which projects the globe unto a cylinder. The projection being centered at y = 0, in the center of your plot, it makes sense to me that the cell area decreases with abs(y).

https://en.wikipedia.org/wiki/Oblique_Mercator_projection

Also, I found a way to compute areas (in meters) with pyproj from lat and lon bounds. This was added to the gist, and gives the following:
image

# Conflicts:
#	HISTORY.rst
#	environment-dev.yml
#	environment.yml
Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

Semi-related:
As we currently have no bulletproof method, the choice makes sense. However, it seems to me that there's no reason to have such an array of methods for such a simple task. I guess that when we have a simple, light area-weighted mean method, we could remove everything else? Is there a use for a collection of imperfect algos ?

This would mean :

  • fix bounds generation when crossing meridians in cf_xarray
  • generate areacella in m² from those bounds (using pyproj and shapely)

xscen/aggregate.py Outdated Show resolved Hide resolved
xscen/aggregate.py Outdated Show resolved Hide resolved
Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>
@RondeauG
Copy link
Contributor Author

RondeauG commented Mar 9, 2023

Semi-related: As we currently have no bulletproof method, the choice makes sense. However, it seems to me that there's no reason to have such an array of methods for such a simple task. I guess that when we have a simple, light area-weighted mean method, we could remove everything else? Is there a use for a collection of imperfect algos ?

That's why I deprecate mean, since cos-lat is a superior version of the same thing. I could see an argument to drop interp_centroid in favor of always using xesmf too.

However, I think that cos-lat and xesmf (SpatialAverager) fundamentaly do different things. cos-lat keeps the grid cells as is, while SpatialAverager will break pixels into smaller parts to closely fit to a given polygon.

@aulemahal
Copy link
Collaborator

@mccrayc When you have time, a second review is needed since you "requested changes".

Copy link
Collaborator

@mccrayc mccrayc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

xscen/aggregate.py Outdated Show resolved Hide resolved
xscen/indicators.py Outdated Show resolved Hide resolved
xscen/aggregate.py Outdated Show resolved Hide resolved
Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>
@RondeauG RondeauG merged commit eaff2df into main Mar 10, 2023
@RondeauG RondeauG deleted the fix-94 branch March 10, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants