Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does ncdump work for this URL in older versions of the netcdf-c, but not for 4.7.x #1832

Closed
mktippett opened this issue Sep 5, 2020 · 10 comments

Comments

@mktippett
Copy link

mktippett commented Sep 5, 2020

This is a follow on to Unidata/netcdf4-python#1041

For 4.7.x, ncdump -h http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/dods fails

See Unidata/netcdf4-python#1041 (comment)

For older versions, there is no problem.

ncdump -h http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/dods
netcdf dods {
dimensions:
	T = 1 ;
variables:
	float T(T) ;
		T:standard_name = "time" ;
		T:pointwidth = 1.f ;
		T:expires = 1601942400 ;
		T:calendar = "360" ;
		T:gridtype = 0 ;
		T:units = "months since 1960-01-01" ;
	float anomaly(T) ;
		anomaly:file_missing_value = -999.9f ;
		anomaly:standard_name = "air_pressure_anomaly" ;
		anomaly:history = "Indices Tahiti slp c8110 standardized\n",
			"Indices Tahiti slp c8110 anomaly\n",
			"Averaged over T2[1981, 2010] minimum 0.0% data present\n",
			"sqrt [   total ( { Indices Tahiti slp c8110 anomaly } squared )   / 360. ]\n",
			"  total [ ( Indices Tahiti slp c8110 anomaly ) squared ]   / 360.\n",
			"Averaged over T2[1981, 2010] minimum 0.0% data present\n",
			"Averaged over T[Jan 1981, Dec 2010] minimum 0.0% data present\n",
			"Indices Darwin slp c8110 standardized\n",
			"Indices Darwin slp c8110 anomaly\n",
			"Averaged over T2[1981, 2010] minimum 0.0% data present\n",
			"sqrt [   total ( { Indices Darwin slp c8110 anomaly } squared )   / 360. ]\n",
			"  total [ ( Indices Darwin slp c8110 anomaly ) squared ]   / 360.\n",
			"Averaged over T2[1981, 2010] minimum 0.0% data present\n",
			"Averaged over T[Jan 1981, Dec 2010] minimum 0.0% data present" ;
		anomaly:units = "unitless" ;
		anomaly:expires = 1601942400 ;
		anomaly:long_name = "sea level pressure anomaly" ;
		anomaly:missing_value = -999.9f ;

// global attributes:
		:Conventions = "IRIDL" ;
}
@DennisHeimbigner
Copy link
Collaborator

This has to do with url encoding. If you use this URL, it should work (if quoted):

'http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/(Jan 1979)/VALUE/dods'
The problem is that netcdf assumes that the specified URL is not URL encoded,
so this case it is doubly encoding and the server cannot deal with that.
If I recall, this changed because of the way Apache started enforcing proper encoding.

@mktippett
Copy link
Author

I think that I understand. It would seem that having the URL encoded should always be ok.

This one without URL encoding does not work

ncdump -h "http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/(Jan 1979)/VALUE/T/(days since 1960-01-01)streamgridunitconvert/dods"

but maybe it's server side. I don't what URL is actually being sent to the server..

@DennisHeimbigner
Copy link
Collaborator

It appears that this most recent problem has to do with handling of
blanks. I was replacing blanks with '+', which works ok with most servers.
Apparently that is not the case here. So I changed the code to use
%20 and that seems to fix it. I am testing now to see if it causes
other problems. I also need to add some notest to NUG/DAP2.dox about this.

@mktippett
Copy link
Author

Thanks! I admit to being a little confused with who is doing the URL encoding and who should be doing it. I tend to use the URL encoded strings since Ingrid provides them in the webpage (e.g.,http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/T/%28days%20since%201960-01-01%29streamgridunitconvert/dods) and they work (usually) across applications.

If you identify an issue with the Ingrid server (iridl), I can contact the developers.

@lesserwhirls
Copy link
Contributor

I think in general, the client should encode portions of the URL that it programatically constructs by default (say, when preparing to make a REST API call) and the server should decode before passing information related to the request to the server-side application. However, the backend Ingrid service does not appear to be able to handle enocoded URLs properly (specifically the query portion).

Unidata/thredds#1144 (comment)

I can't find much information about Ingrid, but my guess is that they've implemented their own web server and it does not fully handle the rules in RFC3986...and that guess is based on language in this entry on the "software that uses netCDF page:

Ingrid is currently running as a WWW daemon that can be accessed through [...]

The headers indicate they have a Squid caching proxy setup, so decoding issues might lie above the Ingrid service (note: it's a dangerously old version of Squid, and I cannot find much information about how it handles encoded URLs).

The best approach I've found with the iridl.ldeo.columbia.edu site is to turn off any URL encoding done by the client, leave anything I've added to the query unencoded, and encode the path as best I can by hand. For example, somewhere along the communication chain one or more servers know enough about percent encoding to handle %20 (' '), %28 ((), and %29 ())in the path of the URL:

curl -G -v http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/T/%28days%20since%201960-01-01%29streamgridunitconvert/dods.dds
*   Trying 129.236.110.35...
* TCP_NODELAY set
* Connected to iridl.ldeo.columbia.edu (129.236.110.35) port 80 (#0)
> GET /SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/T/%28days%20since%201960-01-01%29streamgridunitconvert/dods.dds HTTP/1.1
> Host: iridl.ldeo.columbia.edu
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 07 Sep 2020 13:15:17 GMT
< Expires: Tue, 06 Oct 2020 00:19:00 GMT
< XDODS-Server: Ingrid/2.15
< Server: Ingrid 0.9
< Mime-Version: 1.0
< Access-Control-Allow-Origin: *
< Cache-Control: public
< Content-Description: dods-dds
< Content-Type: text/plain
< Last-Modified: Sun, 06 Sep 2020 06:49:02 GMT
< Content-Length: 150
< X-Cache: MISS from gfs2mon2.ldeo.columbia.edu
< X-Cache-Lookup: MISS from gfs2mon2.ldeo.columbia.edu:3128
< X-Origin-Date: Mon, 07 Sep 2020 12:56:17 GMT
< X-Origin-Expires: Tue, 06 Oct 2020 00:00:00 GMT
< X-Cache-Age: 1140
< X-Cache: HIT from iridls2.ldeo.columbia.edu
< X-Cache-Lookup: HIT from iridls2.ldeo.columbia.edu:80
< Via: 1.0 gfs2mon2.ldeo.columbia.edu:3128 (squid/2.7.STABLE9-20110824), 1.0 iridls2.ldeo.columbia.edu:80 (squid/2.7.STABLE9)
<
Dataset {
    Float32 T[T = 1];
    Grid {
     ARRAY:
        Float32 anomaly[T = 1];
     MAPS:
        Float32 T[T = 1];
    } anomaly;
} anomaly;

but as pointed out in the github issue I linked to above, there is a chance the request will fail if you encode the query portion of the URL. I've had mixed results on the ability of this server to handle percent encoding in the query, however (sometimes it works, sometimes it does not). My default when interacting with this particular site is to leave the query unencoded.

@DennisHeimbigner - is there a way to tell netCDF-C to not handle URL encoding? Perhaps an option in the .dodsrc file, or even better, a runtime option?

@DennisHeimbigner
Copy link
Collaborator

Currently there is no way to tell the DAP2 client library code to not encode something.
My rule has been that all URL encoding should be handled by the netcdf-c DAP2
code. Primarily because the URL being sent is a mix of the users' URL plus
internally generated constraints to minimize the amount of data requested of the server.
I already have some hacked exceptions in the DAP2 code to handle other Columbia
server foibles. I suppose we could add yet another.

@DennisHeimbigner
Copy link
Collaborator

I have attached a zip file containing a new version of libdispatch/ncuri.c.
Please try it and see if it solves the problem.
fixblanks.zip

@mktippett
Copy link
Author

Thanks! I don't know exactly how to do that but I will try compiling it later. I got here from an error in xarray, and I'm a little over my head.

@DennisHeimbigner
Copy link
Collaborator

If you installed netcdf using a package manager (apt, yum, conda, etc) then
you cannot test the change. So you may have to wait a while for a release
containing this fix. Sorry.

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Sep 8, 2020
re: Github issue Unidata#1832
and Github issue Unidata/netcdf4-python#1041

Handling of URL escape sequences for some servers
(e.g. http://iridl.ldeo.columbia.edu) appears to be somewhat
non-standard.
In particular, certain characters need escaping that other servers
do not. Fortunately, the changes should also work existing other servers.
@WardF
Copy link
Member

WardF commented Sep 9, 2020

Fixed in #1835

@WardF WardF closed this as completed Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants