Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset from url not found #1043

Closed
aaronspring opened this issue Sep 23, 2020 · 15 comments
Closed

Dataset from url not found #1043

aaronspring opened this issue Sep 23, 2020 · 15 comments

Comments

@aaronspring
Copy link

What happened:

I tried to open a remote url and got OSError, but !wget url works

What you expected to happen:

open the remote netcdf file

Minimal Complete Verifiable Example:

from netCDF4 import Dataset

import netCDF4
netCDF4.__version__


url='https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc'
# working_url='https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20200923_0000.grib2'

Dataset(url)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-14-265839034cee> in <module>
----> 1 Dataset(url)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc'

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 2.6.32-754.29.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.2

xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: 1.5.5
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.1.0
cfgrib: 0.9.7.6
iris: 2.2.0
bottleneck: 1.3.1
dask: 2.15.0
distributed: 2.20.0
matplotlib: 3.1.2
cartopy: 0.17.0
seaborn: 0.10.1
numbagg: None
pint: 0.11
setuptools: 47.1.1.post20200529
pip: 20.2.3
conda: None
pytest: 5.3.5
IPython: 7.15.0
sphinx: None

@rabernat
Copy link

Is url an opendap endpoint or just a netCDF file on an http server?

Until recently, the netCDF4 C library (which underlies netcdf4-python) has not been able to open the latter. However, I seem to remember recently @dopplershift posting about how this capability has recently been added; however, you need to add a special query string to the URL, which I can't remember or find in the docs.

@aaronspring
Copy link
Author

Solution: xr.open_dataset('https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes')

thanks @rabernat and @dopplershift

I am absolutely convinced (but cannot prove it) that a month ago and way before, xr.open_dataset('https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc') used to work without adding #mode=bytes.

@rabernat
Copy link

The only way I have ever made this work in the past was with h5netcdf (which supports opening file-like objects) + fsspec (which provides file-like objects for remote URLs).

@aaronspring
Copy link
Author

you are right @rabernat, I used intake-xarray and fsspec (not specifying directly h5netcdf but probably internally somehow) to open remote files: https://gist.github.com/aaronspring/30c904009cfffecb15ab54b247e0472a

@rabernat
Copy link

rabernat commented Sep 23, 2020

On thing I don't like about intake is that it is sometimes capable of hiding too much. This is more a problem for developers than pure "users", although I think that distinction is not super useful.

@rsignell-usgs
Copy link

rsignell-usgs commented Sep 23, 2020

@aaronspring, it's worth noting that prepending simplecache to the URL you pass to fsspec causes the whole file to be downloaded, while using fsspec without simplecache or using the new netcdf4 #mode=bytes does not.

I tried these out with URL you provided, but it was very slow accessing the data on this server from AWS us-west-2. So I downloaded the 21MB file using wget (which took 15 minutes!), and put it on S3.

On S3 using these two methods I get these results:
2020-09-23_10-09-36

@dopplershift
Copy link
Member

Well that's just silly. I've opened the netCDF-c performance issue at Unidata/netcdf-c#1848.

@barronh
Copy link
Contributor

barronh commented Nov 4, 2021

Does netcdf4-python work with fsspec? This would be an amazing feature!

@dopplershift
Copy link
Member

@barronh No it does not. Since netcdf4-python is a thin wrapper around the netCDF C library, it only supports I/O methods supported there. That list has grown to include things like S3, Zarr, opendap, and HTTP byte-range requests, though.

@barronh
Copy link
Contributor

barronh commented Nov 4, 2021

@dopplershift - Thanks. That makes sense. The S3 bucket was my primary interest, so it sounds like the limitation for me is the backend library. If I updated libnetcdf4 and linked netcdf4-python to it, then I would have s3 support already.

Thanks!

@rabernat
Copy link

rabernat commented Nov 4, 2021

What would be great would be a C (or Rust?) implementation of fsspec that could be shared among many C libraries (e.g. NetCDF + GDAL) that need flexible cloud-style I/O. Kind of like GDAL VSI but decoupled from GDAL.

@guigrpa
Copy link

guigrpa commented Feb 27, 2023

Is #mode=bytes still supported? On netdf4-python v1.6.2, it returns an error:

>>> ds = netCDF4.Dataset('https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/netCDF4/_netCDF4.pyx", line 2463, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2026, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes'

Tried with netCDF4 v1.5.8 and v.1.6.2.

@dopplershift
Copy link
Member

@guigrpa I've opened an issue upstream in the C library to dig into why that URL isn't working (it should be).

@khouakhi
Copy link

I get
OSError: [Errno -67] NetCDF: libcurl failure: 'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes'
using recent netCDF4 1.6.3 and xarray 2023.3.0

@dopplershift
Copy link
Member

@khouakhi This should be fixed with netCDF-c 4.9.2. The latest netcdf4 packages on conda-forge should pull it in, not sure what, if anything, needs to be done for the PyPI packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants