-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing ZARR files on S3 #122
Comments
Hi @LewisJarrod, thanks for bringing this to our attention. You can find up-to-date documentation for the CMIP6 data here: https://pangeo-data.github.io/pangeo-cmip6-cloud/ Some files have recently been renamed, but if you are going through the csv catalog, theoretically everything should be there. @naomi-henderson may have some guidance. |
hi @rabernat, just to clarify, we were actually using the file named: cmip6-zarr-consolidated-stores.csv when we query the .csv/dataframe using either this .csv or the pangeo-cmip6.csv, it gives zstore paths that reference directories that aren't S3. this started happening maybe 3 weeks ago. thanks in advance, |
@clittleaer, Sorry about this - the AWS collection is a clone of the GC collection. I did a massive re-organization of the naming scheme for the zarr stores on GC. Unfortunately for the rest of us, @charlesbluca, who set up the process to clone from GC to AWS has left us for a position at NVIDIA. Since ALL of the zarr stores need to be deleted and recopied, the cloning process to AWS is taking quite a while. In addition, the CSV files need to be updated with the gs urls changed to s3 urls. @charlesbluca and/or I will try to give an update of when to expect these changes to properly migrate to AWS. |
Thanks for the update Naomi! |
@clittleaer, the Github Actions cloning scripts needed to be updated. Thanks again for opening this issue, there was indeed a problem. All is working now, but it may take a day or two to finish the whole process. I will try to remember to make a note here when everything is back to normal |
that would be great @naomi-henderson p.s. thanks @rabernat and others for making this resource available! |
We now generate the S3 CMIP6 catalog directly from crawling the S3 collection. This means that, even though the GC and S3 collections might be temporarily out of sync, the catalog for each only list existing ZARR files. So, @clittleaer, even though the new restructured data has not been completely copied, the S3 catalog should now represent the current state. Please open an issue here if there are any more problems |
Restructuring on S3 is now complete, @clittleaer, and the S3 and GCS buckets and their catalogs should have the same datasets. If you find any discrepancies and/or suggestions, please open an issue here: pangeo-cmip6-cloud |
Hello, we have been trying to access CMIP6 data based on the file locations listed in
cmip6-pds/cmip6.csv
andcmip6-pds/pangeo-cmip6.csv
but have been running into issues recently with some of the directories being empty.Last year we were following the same access pattern successfully, but either the file paths on S3 or the CSV seems to have changed.
Is this directory currently being restructured or should we be using another one of the CSV files listed?
The text was updated successfully, but these errors were encountered: