Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample to seasons - exclude seasons with missing months #80

Closed
mccrayc opened this issue Sep 20, 2022 · 10 comments · Fixed by #265
Closed

Resample to seasons - exclude seasons with missing months #80

mccrayc opened this issue Sep 20, 2022 · 10 comments · Fixed by #265
Labels
enhancement New feature or request

Comments

@mccrayc
Copy link
Collaborator

mccrayc commented Sep 20, 2022

Addressing a Problem?

Currently, resampling data to seasons (e.g., with frequency = 'QS-DEC') can result in partial seasons being counted the same as full seasons. For example, with data for Jan 1980-Dec 1981, Jan 1980 and Dec 1981 will each count as one DJF period even though they are only one month. The DJF mean will then = [Dec 1980 + (Dec 1980-Feb 1981) + Dec 1981]/3. For climatological calculations, Jan 1980-Dec 1981 should probably only give one DJF period (Dec 1980-Feb 1981) but two periods for the other seasons. Having different numbers of periods for each of the seasons can complicate parameter selection for xs.climatological_mean().

Potential Solution

  • My current workaround in this case is to select period = ['1980-03-01', '1981-11-30'] and then set min_periods = window-1
    ds_clim = xs.climatological_mean(ds=ds_dict['QS-DEC'], window=2, interval=2, min_periods = 1, to_level="2yr-climatology")
  • Ideally, any resample to QS-DEC during xs.extract_dataset() would have a "min_periods" type of parameter that refers to a minimum number of months rather than years, or a True/False parameter to only include complete seasons in calculations.
  • xs.climatological_mean() could also be modified to facilitate seasonal calculations since these are fairly common in climatology.

Additional context

One challenge is that the metadata generated is largely based on the number if years (e.g., "2yr-climatology") which might be a bit misleading when there is 1 fewer DJF period than the other seasons. Perhaps the metadata can be adapted to take this into account for seasonal calculations.

@mccrayc mccrayc added the enhancement New feature or request label Sep 20, 2022
@RondeauG
Copy link
Contributor

RondeauG commented Sep 20, 2022

Does xclim give you a value for those cases where you only have a single month? I think there's an option for it to give you a NaN, in which case you would now have [NaN + (Dec 1980-Feb 1981) + NaN]... which should work and be divided by 1 instead of 3?

Edit: I ask, because xs.climatological_mean doesn't really have a way to know how many months were used for the computation of the indicator, only whether it's a NaN or not.

@juliettelavoie
Copy link
Contributor

juliettelavoie commented Sep 20, 2022

@RondeauG, Chris doesn't use the compute_indicators fonction. He does the search-> extract -> climatological_mean.
It is the resample of xscen that should put Nan on the cases where you only have 1 month.

https://stackoverflow.com/c/ouranos/questions/62/

@aulemahal
Copy link
Collaborator

Je suis d'avis que extract_dataset wasn't meant for this kind of resampling. You're talking about indicator-level resampling here. But I do understand how it's kinda easier for you than to setup a second step of indicator calculation. One technical issue here is that the resample in extract_dataset is independent of the input frequency, but a "min period" check would need that information. Xclim indicators are input-frequency dependent and thus can manage this.

So, @mccrayc how much more work would it be for you to add an indicator-computation workflow? If the answer is "more than enough" I think we could indeed add a completeness check in the resample step of the extraction.

@juliettelavoie
Copy link
Contributor

juliettelavoie commented Sep 20, 2022

@mccrayc The other way to do it would be to extract the daily files and let compute_indicators do the resampling (with freq= QS-DEC). This will put Nans in the uncomplete periods. Then, you can use climatological_mean. This is more a typical workflow for xscen.

The issue here ( and why I didn't bring it up sooner in SO) is that xclim doesn't have a pr_mean indicator...

@aulemahal
Copy link
Collaborator

aulemahal commented Sep 20, 2022

Xclim does have a "pr_mean" indicator, but it has another name since it exposes more functionality:
https://xclim.readthedocs.io/en/stable/xclim.indicators.atmos.html#xclim.indicators.atmos._precip.daily_pr_intensity

It's the daily average intensity ON WET DAYS, which is the nuance. But you could pass thresh="-1 mm/d" and it becomes a pr_mean:

seasonal_pr_mean:
  base: sdii
  cf_attrs:
    var_name: pr_mean
  parameters:
    thresh: -1 mm/d
    freq: QS-DEC

@mccrayc
Copy link
Collaborator Author

mccrayc commented Sep 20, 2022

In my case I am using the monthly statistics that are already archived to save time on calculations (e.g., the files in /tank/climato/arch/cau/statmens). I haven't yet taken the time to fully explore the indicator functionality of xclim, but at first glance it seems that most of the indicators are based on daily values? Since my data are just monthly means of tas, tasmax, tasmin and pr it seemed much simpler to do the averages this way. But maybe it's simpler than I am understanding to extend the daily indicators for use with monthly data?

@aulemahal
Copy link
Collaborator

aulemahal commented Sep 20, 2022

Hum indeed xclim is more oriented towards daily inputs.

I mean, it's a simple problem, so it may be easier to just do it (instead of waiting for a fix in xscen):

(to be inserted after the extract and before the climatological mean)

  1. Brutal
ds_qs = ds_ms.resample(time='QS-DEC').mean().isel(time=slice(1, -1))
  1. Fancy - xclim
ds_qs = ds_ms.resample(time='QS-DEC').mean()
missing = xc.core.missing.missing_any(ds, 'QS-DEC', src_timestep='M')
ds_qs = ds_qs.where(~missing)

But I agree with implementing some fail-safe option in xscen.extract.resample.

@RondeauG
Copy link
Contributor

xscen.extract.resample was designed with climate scenarios in mind, which means going from subdaily data to daily. However, it's becoming clear that we need to improve it and more explicitely support going to coarser temporal resolutions such as MS and QS.

@RondeauG
Copy link
Contributor

RondeauG commented Oct 26, 2022

Coming back to this. If we want to support coarser temporal resolutions in xscen.extract.resample, would it be possible to:

  1. Aggregate to daily data if required, using what is already implemented.
  2. Once the data is daily, use compute_indicators from within xscen.extract.resample to go to MS/QS/YS/whatever.

The potential issue I see with this is if the data is already coarser than daily (e.g. MS --> QS), but that's easy to implement, and maybe once again a special case for uas/vas.

@RondeauG RondeauG added this to the 2023 Development Plan milestone Feb 9, 2023
@RondeauG
Copy link
Contributor

There doesn't seem to be any pretty way to apply a weighted resample in xarray. The best that I could come up with is something like:

(da * da.time.dt.daysinmonth).groupby("time.year").sum(dim="time") / da.time.dt.daysinmonth.groupby("time.year").sum(dim="time")

For MS --> YS. Supporting MS --> QS-DEC would be even uglier.

@aulemahal aulemahal mentioned this issue Oct 2, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants