Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize dims as X/Y when lat/lon or rlat/rlon #221

Merged
merged 21 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
bf7795c
generalize dims as X/Y when lat/lon or rlat/rlon
vindelico Jul 5, 2023
7611f83
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 6, 2023
ee8e507
updated AUTHORS.rst and HISTORY.rst
vindelico Jul 6, 2023
a739554
put code changes and author/history changes together
vindelico Jul 6, 2023
b2509b7
update .zenodo.json
vindelico Jul 6, 2023
1ed3f32
Update xscen/io.py
vindelico Jul 10, 2023
c63b005
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 10, 2023
82ab63a
Update xscen/io.py
vindelico Jul 10, 2023
4fe31a4
Update HISTORY.rst
vindelico Jul 10, 2023
aea398e
Update HISTORY.rst
vindelico Jul 10, 2023
9b28425
Merge remote-tracking branch 'origin/main' into generalize_rlatrlon_l…
vindelico Jul 10, 2023
fafe3ec
documentation and tests for generalize_rlatrlon_latlon
vindelico Jul 24, 2023
412309f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 24, 2023
ba67c10
Update docs/notebooks/2_getting_started.ipynb
vindelico Jul 25, 2023
ee324d2
Update xscen/io.py
vindelico Jul 25, 2023
ab3c08f
repairs following github discussion
vindelico Jul 25, 2023
f9be133
repairs following github discussion 2
vindelico Jul 25, 2023
112abc8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 25, 2023
633469d
Update HISTORY.rst
vindelico Jul 25, 2023
bf9d906
reformatted tests
RondeauG Jul 25, 2023
7b93897
Merge remote-tracking branch 'origin/main' into generalize_rlatrlon_l…
RondeauG Jul 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@
"name": "Caron, Louis-Philippe",
"affiliation": "Ouranos",
"orcid": "0000-0001-5221-0147"
},
{
"name": "Braun, Marco",
"affiliation": "Ouranos",
"orcid": "0000-0001-5061-3217"
}
],
"keywords": [
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ Contributors
* Louis-Philippe Caron <caron.louis-philippe@ouranos.ca>
* Sarah Gammon <gammon.sarah@ouranos.ca> `@SarahG-579462 <https://github.com/SarahG-579462>`_
* Yannick Rousseau
* Marco Braun <Braun.Marco@ouranos.ca> `@vindelico <https://github.com/vindelico>`_
3 changes: 2 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ History

v0.7.0 (unreleased)
-------------------
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Trevor James Smith (:user:`Zeitsperre`), Juliette Lavoie (:user:`juliettelavoie`).
Contributors to this version: Gabriel Rondeau-Genesse (:user:`RondeauG`), Pascal Bourgault (:user:`aulemahal`), Trevor James Smith (:user:`Zeitsperre`), Juliette Lavoie (:user:`juliettelavoie`), Marco Braun (:user:`vindelico`).

Announcements
^^^^^^^^^^^^^
Expand All @@ -16,6 +16,7 @@ New features and enhancements
* New function `get_warming_level` to search within the IPCC CMIP global temperatures CSV without requiring data. (:issue:`208`, :pull:`210`).
* File re-structuration from catalogs with ``xscen.catutils.build_path``. (:pull:`205`).
* New scripting functions `save_and_update` and `move_and_delete`. (:pull:`214`).
* Spatial dimensions can be generalized as X/Y when rechunking and will be mapped to rlon/rlat or lon/lat accordingly. (:pull:`221`).
vindelico marked this conversation as resolved.
Show resolved Hide resolved

Breaking changes
^^^^^^^^^^^^^^^^
Expand Down
39 changes: 21 additions & 18 deletions docs/notebooks/2_getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"### Searching a subset of datasets within *DataCatalogs*\n",
"\n",
"<div class=\"alert alert-info\"> <b>INFO</b>\n",
" \n",
"\n",
"At this stage, the search criteria should be for variables that will be **bias corrected**, not necessarily the variables required for the final product. For example, if `sfcWindfromdir` is the final product, then `uas` and `vas` should be searched for since these are the variables that will be bias corrected.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -130,8 +130,8 @@
"source": [
"## Extracting data\n",
"\n",
"<div class=\"alert alert-warning\"> <b>WARNING</b> \n",
" \n",
"<div class=\"alert alert-warning\"> <b>WARNING</b>\n",
"\n",
"It is heavily recommended to stop and analyse the results of `search_data_catalogs` before proceeding to the extraction function.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -221,8 +221,8 @@
"\n",
"**NOTE:** Calling the extraction function without passing by `search_data_catalogs` beforehand is possible, but will not support *DerivedVariables*.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"`extract_dataset` currently only accepts a single unique ID at a time.\n",
"</div>"
]
Expand Down Expand Up @@ -259,11 +259,14 @@
"`xscen` has two functions for the purpose of saving data: `save_to_netcdf` and `save_to_zarr`. If possible for a given project, it is strongly recommended to use Zarr files since these are often orders of magnitude faster to read and create compared to NetCDF. They do have a few quirks, however:\n",
"\n",
"- Chunk size must separate the dataset in *exactly* equal chunks (with the exception of the last). While it is recommended to calculate ideal chunking and provide them explicitely to the function, `io.estimate_chunks()` can be used to automatically estimate a decent chunking. In a similar way, `io.subset_maxsize()` can be used to roughly cut a dataset along the *time* dimension into subsets of a given size (in Gb), which is especially useful for saving NetCDF files.\n",
" Chunk sizes can be passed to the two saving functions in a dictionary. Spatial dimensions can be generalized as `'X'` and `'Y'`, which will be mapped to the *xarray.Dataset*'s actual grid type's dimension names.\n",
"\n",
"\n",
"- Default behaviour for a Zarr is to act like a directory, with each new variable being assigned a subfolder. This is great when all required variables have the same dimensions and frequency, but will crash otherwise. If you have daily *tasmax* and subdaily *pr*, for example, they need to be assigned different paths.\n",
"\n",
"\n",
"\n",
"\n",
"#### Updating the catalog\n",
"\n",
"`intake-esm` will automatically copy the catalog's entry in the dataset's metadata, with a `cat:attr` format. Where appropriate, `xscen` updates that information to keep the metadata up to date with the manipulations. `ProjectCatalog.update_from_ds` will in turn read the metadata of a Dataset and fill in the information into a new catalog entry.\n",
Expand Down Expand Up @@ -376,7 +379,7 @@
"source": [
"## Regridding data\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"Regridding in `xscen` is built upon `xESMF`. For more information on basic usage and available regridding methods, [please consult their documentation](https://xesmf.readthedocs.io/en/latest/). Their [masking and extrapolation tutorial](https://xesmf.readthedocs.io/en/latest/notebooks/Masking.html) is of particular interest.\n",
"\n",
Expand Down Expand Up @@ -481,8 +484,8 @@
"source": [
"### Preparing arguments for xESMF.Regridder\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"xESMF's API appears to be broken on their ReadTheDocs. For a list of available arguments and options in `Regridder()`, please consult their [Github page](https://github.com/pangeo-data/xESMF/blob/master/xesmf/frontend.py) directly.\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -522,7 +525,7 @@
"Other options exist in `ESMF/ESMPy`, but not `xESMF`. As they get implemented, they should automatically get supported by `xscen`.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
"\n",
"Some utilities that exist in `xESMF` have not yet been explicitely added to `xscen`. If *conservative* regridding is desired, for instance, some additional scripts might be required on the User's side to create the lon/lat boundaries\n",
"</div>"
]
Expand Down Expand Up @@ -642,8 +645,8 @@
"source": [
"## Bias adjusting data\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"Bias adjustment in `xscen` is built upon `xclim.sdba`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/sdba.html).\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -694,9 +697,9 @@
"- `periods` defines the period(s) to bias adjust.\n",
"- `xclim_adjust_kwargs` is described above.\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"These functions currently do not support multiple variables due to the fact that train and adjust arguments might vary. The function must be called separately for every variable. \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"These functions currently do not support multiple variables due to the fact that train and adjust arguments might vary. The function must be called separately for every variable.\n",
"</div>"
]
},
Expand Down Expand Up @@ -804,8 +807,8 @@
"source": [
"## Computing indicators\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b> \n",
" \n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
"\n",
"`xscen` relies heavily on `xclim`'s YAML support for calculating indicators. For more information on how to build the YAML file, consult [this Notebook](https://xclim.readthedocs.io/en/latest/notebooks/extendxclim.html?highlight=yaml#YAML-file).\n",
"</div>\n",
"\n",
Expand Down Expand Up @@ -1077,7 +1080,7 @@
"\n",
"Usually, we would create an ensemble out of different models, but for this toy example, it will be made out of the 2 experiments of the same model that we have available.\n",
"\n",
"<div class=\"alert alert-warning\"> <b>WARNING</b> \n",
"<div class=\"alert alert-warning\"> <b>WARNING</b>\n",
"\n",
"If given a set of paths, `xclim.ensembles.create_ensemble` will ignore the chunking on disk and open the datasets with only a chunking along the new `realization` dimension. Thus, for large datasets, `create_kwargs` should be used to explicitely specify chunks.\n",
"</div>"
Expand Down Expand Up @@ -1246,7 +1249,7 @@
"\n",
"print(\"\")\n",
"print(\"Inspect final attributes\")\n",
"display(ds_clean)"
"display(ds_clean)\n"
]
}
],
Expand Down
120 changes: 120 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
import shutil
from pathlib import Path

import numpy as np
import pandas as pd
import pytest
import xarray as xr
from xclim.testing.helpers import test_timeseries as timeseries

import xscen as xs

Expand Down Expand Up @@ -43,3 +47,119 @@ def samplecat():
xr_open_kwargs={"engine": "h5netcdf"},
)
return xs.DataCatalog({"esmcat": xs.catalog.esm_col_data, "df": df})


@pytest.fixture
def datablock_3d():
RondeauG marked this conversation as resolved.
Show resolved Hide resolved
def _datablock_3d(
values,
variable,
x,
x_start,
y,
y_start,
x_step=0.1,
y_step=0.1,
start="7/1/2000",
freq="D",
units=None,
as_dataset=False,
):
"""
Create a generic timeseries object based on pre-defined dictionaries of existing variables.

Parameters
----------
values : np.ndarray
The values to be assigned to the variable. Dimensions are interpreted [T, Y, X].
variable : str
The variable name.
x : str
The name of the x coordinate.
x_start : float
The starting value of the x coordinate.
y : str
The name of the y coordinate.
y_start : float
The starting value of the y coordinate.
x_step : float
The step between x values.
y_step : float
The step between y values.
start : str
The starting date of the time coordinate.
freq : str
The frequency of the time coordinate.
units : str
The units of the variable.
as_dataset : bool
If True, return a Dataset, else a DataArray.
"""
attrs = {
"lat": {
"units": "degrees_north",
"description": "Latitude.",
"standard_name": "latitude",
},
"lon": {
"units": "degrees_east",
"description": "Longitude.",
"standard_name": "longitude",
},
"rlon": {
"units": "degrees",
"description": "Rotated longitude.",
"standard_name": "grid_longitude",
},
"rlat": {
"units": "degrees",
"description": "Rotated latitude.",
"standard_name": "grid_latitude",
},
"x": {
"units": "m",
"description": "Projection x coordinate.",
"standard_name": "projection_x_coordinate",
},
"y": {
"units": "m",
"description": "Projection y coordinate.",
"standard_name": "projection_y_coordinate",
},
}

dims = {
"time": xr.DataArray(
pd.date_range(start, periods=values.shape[0], freq=freq), dims="time"
),
y: xr.DataArray(
np.arange(y_start, y_start + values.shape[1] * y_step, y_step),
dims=y,
attrs=attrs[y],
),
x: xr.DataArray(
np.arange(x_start, x_start + values.shape[2] * x_step, x_step),
dims=x,
attrs=attrs[x],
),
}

# Get the attributes using xclim, then create the DataArray
ts_1d = timeseries(values[:, 0, 0], variable, start, units=units, freq=freq)
da = xr.DataArray(
values, coords=dims, dims=dims, name=variable, attrs=ts_1d.attrs
)

# Add the axis information
da[x].attrs["axis"] = "X"
da[y].attrs["axis"] = "Y"
da["time"].attrs["axis"] = "T"

# TODO: Fully support rotated grids (2D coordinates + grid_mapping)

if as_dataset:
return da.to_dataset()
else:
return da

return _datablock_3d
68 changes: 68 additions & 0 deletions tests/test_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import numpy as np
import pytest

import xscen as xs


class TestRechunkForSaving:
@pytest.mark.parametrize(
"dims, xy",
[
(["lon", "lat"], True),
(["lon", "lat"], False),
(["rlon", "rlat"], True),
(["rlon", "rlat"], False),
],
)
def test_options(self, datablock_3d, dims, xy):
ds = datablock_3d(
np.random.random((30, 30, 50)),
variable="tas",
x=dims[0],
x_start=-70 if dims[0] == "lon" else 0,
y=dims[1],
y_start=45 if dims[1] == "lat" else 0,
as_dataset=True,
)
x = "X" if xy else dims[0]
y = "Y" if xy else dims[1]
new_chunks = {y: 10, x: 10, "time": 20}
ds_ch = xs.io.rechunk_for_saving(ds, new_chunks)
for dim, chunks in ds_ch.chunks.items():
dim = (
dim
if (xy is False or dim == "time")
else "X"
if dim == dims[0]
else "Y"
)
assert chunks[0] == new_chunks[dim]

def test_variables(self, datablock_3d):
ds = datablock_3d(
np.random.random((30, 30, 50)),
variable="tas",
x="lon",
x_start=-70,
y="lat",
y_start=45,
as_dataset=True,
)
ds["pr"] = datablock_3d(
np.random.random((30, 30, 50)),
variable="pr",
x="lon",
x_start=-70,
y="lat",
y_start=45,
as_dataset=False,
)

new_chunks = {
"tas": {"time": 20, "lon": 10, "lat": 7},
"pr": {"time": 10, "lon": 5, "lat": 3},
}
ds_ch = xs.io.rechunk_for_saving(ds, new_chunks)
for v in ds_ch.data_vars:
for dim, chunks in zip(list(ds.dims), ds_ch[v].chunks):
assert chunks[0] == new_chunks[v][dim]
Loading