Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble reduction and changes to Ensembles #63

Merged
merged 43 commits into from
Sep 15, 2022
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
939d930
reduce.py and improvements to ensembles
RondeauG Sep 1, 2022
725220e
Merge branch 'main' into reduce
RondeauG Sep 1, 2022
1ca0b44
Docstrings
RondeauG Sep 1, 2022
9988a9d
bugfixes
RondeauG Sep 7, 2022
f4e74d2
Merge branch 'main' into reduce
RondeauG Sep 7, 2022
61b8e9b
Merge branch 'main' into reduce
Zeitsperre Sep 8, 2022
3683435
Update xscen/utils.py
RondeauG Sep 12, 2022
55d8693
bugfixes and suggestions from code review
RondeauG Sep 12, 2022
21545e6
fixed docstrings
RondeauG Sep 12, 2022
98795bc
Update xscen/ensembles.py
RondeauG Sep 12, 2022
14a42d0
Update xscen/ensembles.py
RondeauG Sep 12, 2022
1e12a4d
Update xscen/ensembles.py
RondeauG Sep 12, 2022
633255e
Update xscen/ensembles.py
RondeauG Sep 12, 2022
1c95246
suggestions from code review
RondeauG Sep 12, 2022
ddb86c5
fixed imports
RondeauG Sep 12, 2022
31698e8
rename xrkwargs to common_attrs_open_kwargs
RondeauG Sep 12, 2022
acaffba
Merge branch 'main' into reduce
RondeauG Sep 12, 2022
708e819
upd HISTORY
RondeauG Sep 12, 2022
dd0c9c0
id generation in common_attrs_only
RondeauG Sep 12, 2022
92e4808
Update xscen/utils.py
RondeauG Sep 13, 2022
5f1831d
Update xscen/ensembles.py
RondeauG Sep 15, 2022
30aba84
Update xscen/ensembles.py
RondeauG Sep 15, 2022
4866e73
Update xscen/ensembles.py
RondeauG Sep 15, 2022
e59e02c
Update xscen/ensembles.py
RondeauG Sep 15, 2022
db72851
Update xscen/reduce.py
RondeauG Sep 15, 2022
aa9282b
Merge branch 'main' into reduce
RondeauG Sep 15, 2022
656637e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 15, 2022
d3f62c6
Update xscen/ensembles.py
RondeauG Sep 15, 2022
b8f19a7
apply suggestions from code review
RondeauG Sep 15, 2022
4289708
Merge branch 'reduce' of github.com:Ouranosinc/xscen into reduce
RondeauG Sep 15, 2022
6902b1e
Update xscen/ensembles.py
RondeauG Sep 15, 2022
53593cb
Merge branch 'reduce' of github.com:Ouranosinc/xscen into reduce
RondeauG Sep 15, 2022
281791b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 15, 2022
23f7bff
fixed imports
RondeauG Sep 15, 2022
090f94b
Merge branch 'reduce' of github.com:Ouranosinc/xscen into reduce
RondeauG Sep 15, 2022
06d1b44
small fix
RondeauG Sep 15, 2022
ebbddd9
small fix
RondeauG Sep 15, 2022
63b1e69
upd notebooks and docs
RondeauG Sep 15, 2022
173db33
default typing for clusters and fig_data
RondeauG Sep 15, 2022
375fd96
Update xscen/utils.py
RondeauG Sep 15, 2022
7a9543f
removed redundant function
RondeauG Sep 15, 2022
ca976d5
prefix in get_cat_attrs
RondeauG Sep 15, 2022
9e7a387
Merge branch 'main' into reduce
RondeauG Sep 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,15 @@ New features and enhancements
* Do not fail for any grid mapping problem, includin if a grid_mapping attribute mentions a variable that doesn't exist.
* Default email sent to the local user. (:pull:`68`).
* Special accelerated pathway for parsing catalogs with all dates within the datetime64[ns] range (:pull:`75`).
* New functions ``reduce_ensemble`` and ``build_reduction_data`` to support kkz and kmeans clustering (:issue:`4`, :pull:`63`)
* `ensemble_stats` can now loop through multiple statistics, support functions located in `xclim.ensembles._robustness`, and supports weighted realizations (:pull:`63`).
* New function `ensemble_stats.generate_weights` that estimates weights based on simulation metadata (:pull:`63`).
* New function `catalog.unstack_id` to reverse-engineer IDs (:pull:`63`).
* `generate_id` now accepts Datasets (:pull:`63`).

Breaking changes
^^^^^^^^^^^^^^^^
* N/A
* `statistics / stats_kwargs` have been changed/eliminated in `ensemble_stats`, respectively (:pull:`63`).

Bug fixes
^^^^^^^^^
Expand All @@ -41,6 +46,9 @@ Internal changes
* Default method of `xs.extract.resample` now depends on frequency. (:issue:`57`, :pull:`58`).
* Bugfix for `_restrict_by_resolution` with CMIP6 datasets (:pull:`71`).
* More complete check of coverage in ``_subset_file_coverage`` (:issue: `70`, :pull: `72`)
* The code that performs `common_attrs_only` in `ensemble_stats` has been moved to `clean_up` (:pull:`63`).
* Removed the default `to_level` in `clean_up` (:pull:`63`).


v0.3.0 (2022-08-23)
-------------------
Expand Down
7 changes: 7 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ Controlled Vocabulary and Mappings
:members:
:noindex:

Reduction
----------

.. automodule:: xscen.reduce
:members:
:noindex:

Regridding
----------

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Features
notebooks/getting_started
notebooks/config_usage
notebooks/diagnostics
notebooks/ensemble_reduction
columns
api
contributing
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks
Submodule notebooks updated from 1a181b to dad3e4
2 changes: 2 additions & 0 deletions xscen/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
extract,
indicators,
io,
reduce,
regrid,
scripting,
utils,
Expand All @@ -27,6 +28,7 @@
from .extract import extract_dataset, search_data_catalogs # noqa
from .indicators import compute_indicators # noqa
from .io import save_to_netcdf, save_to_zarr # noqa
from .reduce import build_reduction_data, reduce_ensemble
from .regrid import *
from .scripting import (
TimeoutException,
Expand Down
56 changes: 54 additions & 2 deletions xscen/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
"generate_id",
"parse_directory",
"parse_from_ds",
"unstack_id",
]


Expand Down Expand Up @@ -1233,20 +1234,71 @@ def _parse_date(date, fmts):
return date


def generate_id(df: pd.DataFrame, id_columns: Optional[list] = None):
def generate_id(df: Union[pd.DataFrame, xr.Dataset], id_columns: Optional[list] = None):
"""Utility to create an ID from column entries.
Parameters
----------
df: pd.DataFrame
df: pd.DataFrame, xr.Dataset
Data for which to create an ID.
id_columns : list
List of column names on which to base the dataset definition. Empty columns will be skipped.
If None (default), uses :py:data:`ID_COLUMNS`.
"""
if isinstance(df, xr.Dataset):
df = pd.DataFrame.from_dict(
{
key[4:]: [value]
for key, value in df.attrs.items()
if key.startswith("cat:")
}
)

id_columns = [x for x in (id_columns or ID_COLUMNS) if x in df.columns]

return df[id_columns].apply(
lambda row: "_".join(map(str, filter(pd.notna, row.values))), axis=1
)


def unstack_id(df: Union[pd.DataFrame, ProjectCatalog, DataCatalog]) -> dict:
"""
Utility that reverse-engineers an ID using catalog entries.
Parameters
----------
df: Union[pd.DataFrame, ProjectCatalog, DataCatalog]
Either a Project/DataCatalog or the pandas DataFrame.
Returns
-------
dict
Dictionary with one entry per unique ID, which are themselves dictionaries of all the individual parts of the ID.
"""

if isinstance(df, (ProjectCatalog, DataCatalog)):
df = df.df

out = {}
for ids in pd.unique(df["id"]):
subset = df[df["id"] == ids]

# Only keep relevant columns
subset = subset[
[
col
for col in subset.columns
if bool(re.search(f"((_)|(^)){str(subset[col].iloc[0])}((_)|($))", ids))
]
].drop("id", axis=1)

# Make sure that all elements are the same, if there are multiple lines
if len(subset) > 1:
if not all([subset[col].is_unique for col in subset.columns]):
raise ValueError(
"Not all elements of the columns are the same for a given ID!"
)

out[ids] = {attr: subset[attr].iloc[0] for attr in subset.columns}

return out
Loading