Skip to content

Commit

Permalink
Merge pull request #221 from Ouranosinc/generalize_rlatrlon_latlon
Browse files Browse the repository at this point in the history
generalize dims as X/Y when lat/lon or rlat/rlon
  • Loading branch information
RondeauG committed Jul 27, 2023
2 parents f1eb39e + 7b93897 commit 267ed0b
Show file tree
Hide file tree
Showing 7 changed files with 262 additions and 51 deletions.
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@
"name": "Caron, Louis-Philippe",
"affiliation": "Ouranos",
"orcid": "0000-0001-5221-0147"
},
{
"name": "Braun, Marco",
"affiliation": "Ouranos",
"orcid": "0000-0001-5061-3217"
}
],
"keywords": [
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ Contributors
* Louis-Philippe Caron <caron.louis-philippe@ouranos.ca>
* Sarah Gammon <gammon.sarah@ouranos.ca> `@SarahG-579462 <https://github.com/SarahG-579462>`_
* Yannick Rousseau
* Marco Braun <Braun.Marco@ouranos.ca> `@vindelico <https://github.com/vindelico>`_
3 changes: 2 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ History

v0.7.0 (unreleased)
-------------------
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Trevor James Smith (:user:`Zeitsperre`), Juliette Lavoie (:user:`juliettelavoie`).
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Trevor James Smith (:user:`Zeitsperre`), Juliette Lavoie (:user:`juliettelavoie`), Marco Braun (:user:`vindelico`).

Announcements
^^^^^^^^^^^^^
Expand All @@ -16,6 +16,7 @@ New features and enhancements
* New function `get_warming_level` to search within the IPCC CMIP global temperatures CSV without requiring data. (:issue:`208`, :pull:`210`).
* File re-structuration from catalogs with ``xscen.catutils.build_path``. (:pull:`205`).
* New scripting functions `save_and_update` and `move_and_delete`. (:pull:`214`).
* Spatial dimensions can be generalized as X/Y when rechunking and will be mapped to rlon/rlat or lon/lat accordingly. (:pull:`221`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand Down
39 changes: 21 additions & 18 deletions docs/notebooks/2_getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"### Searching a subset of datasets within *DataCatalogs*\n",
"\n",
"<div class=\"alert alert-info\"> <b>INFO</b>\n",
" \n",
"\n",
"At this stage, the search criteria should be for variables that will be **bias corrected**, not necessarily the variables required for the final product. For example, if `sfcWindfromdir` is the final product, then `uas` and `vas` should be searched for since these are the variables that will be bias corrected.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -130,8 +130,8 @@
"source": [
"## Extracting data\n",
"\n",
"<div class=\"alert alert-warning\"> <b>WARNING</b> \n",
" \n",
"<div class=\"alert alert-warning\"> <b>WARNING</b>\n",
"\n",
"It is heavily recommended to stop and analyse the results of `search_data_catalogs` before proceeding to the extraction function.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -221,8 +221,8 @@
"\n",
"**NOTE:** Calling the extraction function without passing by `search_data_catalogs` beforehand is possible, but will not support *DerivedVariables*.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"`extract_dataset` currently only accepts a single unique ID at a time.\n",
"</div>"
]
Expand Down Expand Up @@ -259,11 +259,14 @@
"`xscen` has two functions for the purpose of saving data: `save_to_netcdf` and `save_to_zarr`. If possible for a given project, it is strongly recommended to use Zarr files since these are often orders of magnitude faster to read and create compared to NetCDF. They do have a few quirks, however:\n",
"\n",
"- Chunk size must separate the dataset in *exactly* equal chunks (with the exception of the last). While it is recommended to calculate ideal chunking and provide them explicitely to the function, `io.estimate_chunks()` can be used to automatically estimate a decent chunking. In a similar way, `io.subset_maxsize()` can be used to roughly cut a dataset along the *time* dimension into subsets of a given size (in Gb), which is especially useful for saving NetCDF files.\n",
" Chunk sizes can be passed to the two saving functions in a dictionary. Spatial dimensions can be generalized as `'X'` and `'Y'`, which will be mapped to the *xarray.Dataset*'s actual grid type's dimension names.\n",
"\n",
"\n",
"- Default behaviour for a Zarr is to act like a directory, with each new variable being assigned a subfolder. This is great when all required variables have the same dimensions and frequency, but will crash otherwise. If you have daily *tasmax* and subdaily *pr*, for example, they need to be assigned different paths.\n",
"\n",
"\n",
"\n",
"\n",
"#### Updating the catalog\n",
"\n",
"`intake-esm` will automatically copy the catalog's entry in the dataset's metadata, with a `cat:attr` format. Where appropriate, `xscen` updates that information to keep the metadata up to date with the manipulations. `ProjectCatalog.update_from_ds` will in turn read the metadata of a Dataset and fill in the information into a new catalog entry.\n",
Expand Down Expand Up @@ -376,7 +379,7 @@
"source": [
"## Regridding data\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"Regridding in `xscen` is built upon `xESMF`. For more information on basic usage and available regridding methods, [please consult their documentation](https://xesmf.readthedocs.io/en/latest/). Their [masking and extrapolation tutorial](https://xesmf.readthedocs.io/en/latest/notebooks/Masking.html) is of particular interest.\n",
"\n",
Expand Down Expand Up @@ -481,8 +484,8 @@
"source": [
"### Preparing arguments for xESMF.Regridder\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"xESMF's API appears to be broken on their ReadTheDocs. For a list of available arguments and options in `Regridder()`, please consult their [Github page](https://github.com/pangeo-data/xESMF/blob/master/xesmf/frontend.py) directly.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -522,7 +525,7 @@
"Other options exist in `ESMF/ESMPy`, but not `xESMF`. As they get implemented, they should automatically get supported by `xscen`.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
"\n",
"Some utilities that exist in `xESMF` have not yet been explicitely added to `xscen`. If *conservative* regridding is desired, for instance, some additional scripts might be required on the User's side to create the lon/lat boundaries\n",
"</div>"
]
Expand Down Expand Up @@ -642,8 +645,8 @@
"source": [
"## Bias adjusting data\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"Bias adjustment in `xscen` is built upon `xclim.sdba`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/sdba.html).\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -694,9 +697,9 @@
"- `periods` defines the period(s) to bias adjust.\n",
"- `xclim_adjust_kwargs` is described above.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"These functions currently do not support multiple variables due to the fact that train and adjust arguments might vary. The function must be called separately for every variable. \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"These functions currently do not support multiple variables due to the fact that train and adjust arguments might vary. The function must be called separately for every variable.\n",
"</div>"
]
},
Expand Down Expand Up @@ -804,8 +807,8 @@
"source": [
"## Computing indicators\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"`xscen` relies heavily on `xclim`'s YAML support for calculating indicators. For more information on how to build the YAML file, consult [this Notebook](https://xclim.readthedocs.io/en/latest/notebooks/extendxclim.html?highlight=yaml#YAML-file).\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -1077,7 +1080,7 @@
"\n",
"Usually, we would create an ensemble out of different models, but for this toy example, it will be made out of the 2 experiments of the same model that we have available.\n",
"\n",
"<div class=\"alert alert-warning\"> <b>WARNING</b> \n",
"<div class=\"alert alert-warning\"> <b>WARNING</b>\n",
"\n",
"If given a set of paths, `xclim.ensembles.create_ensemble` will ignore the chunking on disk and open the datasets with only a chunking along the new `realization` dimension. Thus, for large datasets, `create_kwargs` should be used to explicitely specify chunks.\n",
"</div>"
Expand Down Expand Up @@ -1246,7 +1249,7 @@
"\n",
"print(\"\")\n",
"print(\"Inspect final attributes\")\n",
"display(ds_clean)"
"display(ds_clean)\n"
]
}
],
Expand Down
120 changes: 120 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
import shutil
from pathlib import Path

import numpy as np
import pandas as pd
import pytest
import xarray as xr
from xclim.testing.helpers import test_timeseries as timeseries

import xscen as xs

Expand Down Expand Up @@ -43,3 +47,119 @@ def samplecat():
xr_open_kwargs={"engine": "h5netcdf"},
)
return xs.DataCatalog({"esmcat": xs.catalog.esm_col_data, "df": df})


@pytest.fixture
def datablock_3d():
def _datablock_3d(
values,
variable,
x,
x_start,
y,
y_start,
x_step=0.1,
y_step=0.1,
start="7/1/2000",
freq="D",
units=None,
as_dataset=False,
):
"""
Create a generic timeseries object based on pre-defined dictionaries of existing variables.
Parameters
----------
values : np.ndarray
The values to be assigned to the variable. Dimensions are interpreted [T, Y, X].
variable : str
The variable name.
x : str
The name of the x coordinate.
x_start : float
The starting value of the x coordinate.
y : str
The name of the y coordinate.
y_start : float
The starting value of the y coordinate.
x_step : float
The step between x values.
y_step : float
The step between y values.
start : str
The starting date of the time coordinate.
freq : str
The frequency of the time coordinate.
units : str
The units of the variable.
as_dataset : bool
If True, return a Dataset, else a DataArray.
"""
attrs = {
"lat": {
"units": "degrees_north",
"description": "Latitude.",
"standard_name": "latitude",
},
"lon": {
"units": "degrees_east",
"description": "Longitude.",
"standard_name": "longitude",
},
"rlon": {
"units": "degrees",
"description": "Rotated longitude.",
"standard_name": "grid_longitude",
},
"rlat": {
"units": "degrees",
"description": "Rotated latitude.",
"standard_name": "grid_latitude",
},
"x": {
"units": "m",
"description": "Projection x coordinate.",
"standard_name": "projection_x_coordinate",
},
"y": {
"units": "m",
"description": "Projection y coordinate.",
"standard_name": "projection_y_coordinate",
},
}

dims = {
"time": xr.DataArray(
pd.date_range(start, periods=values.shape[0], freq=freq), dims="time"
),
y: xr.DataArray(
np.arange(y_start, y_start + values.shape[1] * y_step, y_step),
dims=y,
attrs=attrs[y],
),
x: xr.DataArray(
np.arange(x_start, x_start + values.shape[2] * x_step, x_step),
dims=x,
attrs=attrs[x],
),
}

# Get the attributes using xclim, then create the DataArray
ts_1d = timeseries(values[:, 0, 0], variable, start, units=units, freq=freq)
da = xr.DataArray(
values, coords=dims, dims=dims, name=variable, attrs=ts_1d.attrs
)

# Add the axis information
da[x].attrs["axis"] = "X"
da[y].attrs["axis"] = "Y"
da["time"].attrs["axis"] = "T"

# TODO: Fully support rotated grids (2D coordinates + grid_mapping)

if as_dataset:
return da.to_dataset()
else:
return da

return _datablock_3d
68 changes: 68 additions & 0 deletions tests/test_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import numpy as np
import pytest

import xscen as xs


class TestRechunkForSaving:
@pytest.mark.parametrize(
"dims, xy",
[
(["lon", "lat"], True),
(["lon", "lat"], False),
(["rlon", "rlat"], True),
(["rlon", "rlat"], False),
],
)
def test_options(self, datablock_3d, dims, xy):
ds = datablock_3d(
np.random.random((30, 30, 50)),
variable="tas",
x=dims[0],
x_start=-70 if dims[0] == "lon" else 0,
y=dims[1],
y_start=45 if dims[1] == "lat" else 0,
as_dataset=True,
)
x = "X" if xy else dims[0]
y = "Y" if xy else dims[1]
new_chunks = {y: 10, x: 10, "time": 20}
ds_ch = xs.io.rechunk_for_saving(ds, new_chunks)
for dim, chunks in ds_ch.chunks.items():
dim = (
dim
if (xy is False or dim == "time")
else "X"
if dim == dims[0]
else "Y"
)
assert chunks[0] == new_chunks[dim]

def test_variables(self, datablock_3d):
ds = datablock_3d(
np.random.random((30, 30, 50)),
variable="tas",
x="lon",
x_start=-70,
y="lat",
y_start=45,
as_dataset=True,
)
ds["pr"] = datablock_3d(
np.random.random((30, 30, 50)),
variable="pr",
x="lon",
x_start=-70,
y="lat",
y_start=45,
as_dataset=False,
)

new_chunks = {
"tas": {"time": 20, "lon": 10, "lat": 7},
"pr": {"time": 10, "lon": 5, "lat": 3},
}
ds_ch = xs.io.rechunk_for_saving(ds, new_chunks)
for v in ds_ch.data_vars:
for dim, chunks in zip(list(ds.dims), ds_ch[v].chunks):
assert chunks[0] == new_chunks[v][dim]
Loading

0 comments on commit 267ed0b

Please sign in to comment.