Missing ZARR files on S3 #122

JarrodBWong · 2021-02-18T15:19:15Z

Hello, we have been trying to access CMIP6 data based on the file locations listed in cmip6-pds/cmip6.csv and cmip6-pds/pangeo-cmip6.csv but have been running into issues recently with some of the directories being empty.

FileNotFoundError: cmip6-pds/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Omon/thetao/gn/.zmetadata

Last year we were following the same access pattern successfully, but either the file paths on S3 or the CSV seems to have changed.

Is this directory currently being restructured or should we be using another one of the CSV files listed?

The text was updated successfully, but these errors were encountered:

rabernat · 2021-02-18T15:25:42Z

Hi @LewisJarrod, thanks for bringing this to our attention. You can find up-to-date documentation for the CMIP6 data here: https://pangeo-data.github.io/pangeo-cmip6-cloud/

Some files have recently been renamed, but if you are going through the csv catalog, theoretically everything should be there.

@naomi-henderson may have some guidance.

clittleaer · 2021-02-18T15:33:11Z

hi @rabernat,

just to clarify, we were actually using the file named: cmip6-zarr-consolidated-stores.csv

when we query the .csv/dataframe using either this .csv or the pangeo-cmip6.csv, it gives zstore paths that reference directories that aren't S3.

this started happening maybe 3 weeks ago.

thanks in advance,
Chris

naomi-henderson · 2021-02-18T15:57:02Z

@clittleaer, Sorry about this - the AWS collection is a clone of the GC collection. I did a massive re-organization of the naming scheme for the zarr stores on GC. Unfortunately for the rest of us, @charlesbluca, who set up the process to clone from GC to AWS has left us for a position at NVIDIA. Since ALL of the zarr stores need to be deleted and recopied, the cloning process to AWS is taking quite a while.

In addition, the CSV files need to be updated with the gs urls changed to s3 urls.

@charlesbluca and/or I will try to give an update of when to expect these changes to properly migrate to AWS.

rabernat · 2021-02-18T16:02:36Z

Thanks for the update Naomi!

naomi-henderson · 2021-02-19T11:01:03Z

@clittleaer, the Github Actions cloning scripts needed to be updated. Thanks again for opening this issue, there was indeed a problem. All is working now, but it may take a day or two to finish the whole process. I will try to remember to make a note here when everything is back to normal

clittleaer · 2021-02-19T11:05:47Z

that would be great @naomi-henderson p.s. thanks @rabernat and others for making this resource available!

naomi-henderson · 2021-02-21T16:13:12Z

We now generate the S3 CMIP6 catalog directly from crawling the S3 collection. This means that, even though the GC and S3 collections might be temporarily out of sync, the catalog for each only list existing ZARR files. So, @clittleaer, even though the new restructured data has not been completely copied, the S3 catalog should now represent the current state. Please open an issue here if there are any more problems

naomi-henderson · 2021-02-24T20:36:52Z

Restructuring on S3 is now complete, @clittleaer, and the S3 and GCS buckets and their catalogs should have the same datasets. If you find any discrepancies and/or suggestions, please open an issue here: pangeo-cmip6-cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ZARR files on S3 #122

Missing ZARR files on S3 #122

JarrodBWong commented Feb 18, 2021

rabernat commented Feb 18, 2021

clittleaer commented Feb 18, 2021

naomi-henderson commented Feb 18, 2021

rabernat commented Feb 18, 2021

naomi-henderson commented Feb 19, 2021

clittleaer commented Feb 19, 2021

naomi-henderson commented Feb 21, 2021

naomi-henderson commented Feb 24, 2021

Missing ZARR files on S3 #122

Missing ZARR files on S3 #122

Comments

JarrodBWong commented Feb 18, 2021

rabernat commented Feb 18, 2021

clittleaer commented Feb 18, 2021

naomi-henderson commented Feb 18, 2021

rabernat commented Feb 18, 2021

naomi-henderson commented Feb 19, 2021

clittleaer commented Feb 19, 2021

naomi-henderson commented Feb 21, 2021

naomi-henderson commented Feb 24, 2021