-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes following ESPO-R5's diagnostics #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Minor comments.
@@ -244,8 +244,6 @@ def properties_and_measures( | |||
# to be able to save in zarr, convert object to string | |||
if "season" in ds1: | |||
ds1["season"] = ds1.season.astype("str") | |||
if "month" in ds1: | |||
ds1["month"] = ds1.month.astype("str") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliettelavoie Would that break the change you are implementing in your PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below
@@ -244,8 +244,6 @@ def properties_and_measures( | |||
# to be able to save in zarr, convert object to string | |||
if "season" in ds1: | |||
ds1["season"] = ds1.season.astype("str") | |||
if "month" in ds1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you removed month but not season ?
Also, I am not 100% sure, but I think the new code I added in save_to_zarr
in PR#100 might fix the problem I was trying to fix here.
The bit of code:
# to address this issue https://github.com/pydata/xarray/issues/3476
for v in list(ds.coords.keys()):
if ds.coords[v].dtype == object:
ds[v].encoding.clear()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, on my side month
is an int! This line would make my script fail, because I was creating the ref properties with a string month coordinate, than passing it to compute measures of sim or scen. It was raising an error above, when trying to compare both datasets, because at that point, sim's properties are still using integers.
If the new code fixes it, it looks less invasive, that's good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok! mais oui je pense que ça va être correct avec le nouveau code.
return ds.to_zarr( | ||
filename, compute=compute, mode="a", encoding=encoding, **zarr_kwargs | ||
) | ||
except TimeoutException: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind keeping that but it doesn't seem that useful to me, if you use a catalog..
If you follow the order [not exist_in_catlog, xscen function, save_to_zarr, update_catalog], it doesn't matter that the file is half written, it will be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we use timeout to get out of dask being frozen. Will it even be able to do the removal or will it keep being frozen ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case the usefulness comes from the fact that I am appending to the zarr dataset. I think I am indeed not using the catalog to it's full potential. But my workflow creates the full dataset than writes it to zarr and it skips all variables already existing. This timeout cleanup ensures partially written variables are being rewritten on the second pass.
AFAIU, within the except TimeoutException:
the fact that dask hangs is not important. The shutil
calls do not call dask, so the files are removed even if any dask object are still "linked" to them. My usecase looks like:
for dataset in catalog:
with Timeout():
with dask.Client():
out = do_things(dataset)
save_to_zarr(out)
Thus, on Timeout files are first removed, then the Client is killed and its object are garbage collected and then we go to the next iteration and we restart the counter and recreate a new client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small advantage of timeout_cleanup
in your case is that before you re-run your workflow to patch the partially written files, the zarr are still openable. Without the cleanup, you would get broken datasets.
This again in the case you are appending to an existing dataset or when you are writing by iterating over each variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okok I understand that this way makes sens when you write a lot of indicators in one zarr.
Pull Request Checklist:
pre-commit
hooks are installed/active in my local clone ($ pre-commit install
)number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
save_to_zarr
, which removes variables that were in the process of being written when receiving aTimeoutException
. (:pull:106
).scripting.skippable
context, allowing the use of CTRL-C to skip code sections. (:pull:106
)properties_and_measures
no longer casts month coordinates to string (:pull:106
).parse_config
. There is a very special option that logs what args were parsed from the config. The log calls have been fixed,Does this PR introduce a breaking change?
No.
Other information:
These changes are the ones I used to help me run my "properties and diagnostics" script for ESPO-R5. Forget the branch name it is meaningless.