-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket module calls with long host names can fail with idna codec error #77139
Comments
While working on a custom conda channel with authentication, I ran into the following UnicodeError: Traceback (most recent call last):
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/conda/core/repodata.py", line 402, in fetch_repodata_remote_request
timeout=timeout)
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
return self.request('GET', url, **kwargs)
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 499, in request
prep.url, proxies, stream, verify, cert
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 672, in merge_environment_settings
env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 692, in get_environ_proxies
if should_bypass_proxies(url, no_proxy=no_proxy):
File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 676, in should_bypass_proxies
bypass = proxy_bypass(netloc)
File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2612, in proxy_bypass
return proxy_bypass_macosx_sysconf(host)
File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2589, in proxy_bypass_macosx_sysconf
return _proxy_bypass_macosx_sysconf(host, proxy_settings)
File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2562, in _proxy_bypass_macosx_sysconf
hostIP = socket.gethostbyname(hostonly)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) The error can be consistently reproduced when the first substring of the url hostname is greater than 64 characters long, as in "0123456789012345678901234567890123456789012345678901234567890123.example.com". This wouldn't be a problem, except that it doesn't seem to separate out credentials from the first substring of the hostname so the entire "[user]:[secret]@xxx" section must be less than 65 characters long. This is problematic for services that use longer API keys and expect their submission over basic auth. |
Thanks for the report. The behavior you see can be further isolated to socket.gethostbyname: >>> import socket
>>> h = "0123456789012345678901234567890123456789012345678901234567890123.example.com"
>>> socket.gethostbyname(h)
Traceback (most recent call last):
File "/usr/lib/python3.6/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) Other socket module calls accepting host names fail similarly, such as getaddrinfo. |
Just to be clear, I don't know if the socket needs to support 64 character long host name sections, so here's an example url that is at the root of my problem that I'm pretty sure it should support: >>> import socket
>>> h = "username:long_api_key0123456789012345678901234567890123456789@www.example.com"
>>> socket.gethostbyname(h)
Traceback (most recent call last):
File "/Users/ablack/miniconda3/lib/python3.6/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) |
Using Ubuntu 16.04 with the 3.6.0 tag I was also able to reproduce the same error reported: import socket
h = "0123456789012345678901234567890123456789012345678901234567890123.example.com"
socket.gethostbyname(h) Traceback (most recent call last):
File "/home/agnosticdev/Documents/code/python/python-dev/cpython-3_6_0/Lib/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "host_test.py", line 8, in <module>
socket.gethostbyname(h)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) It looks like the hostname being 64 characters long is the issue in that it cannot be encoded. Thus falling into the UnicodeError being raised in idna.py: I did some work on this to try and resolve this, but ultimately it was not worth committing so I wanted to report my findings. |
When will this issue be fixed? Thanks! |
According to the DNS standard, hostnames with more than 63 characters per label (the sections between .) are not allowed [https://tools.ietf.org/html/rfc1035#section-2.3.1]. That said, enforcing that at the codec level might be the wrong choice. I threw together a quick patch moving the limits up to 250, and nothing blew up. It's unclear what the general usefulness of such a change would be, since DNS servers probably couldn't handle those requests anyway. As for the original issue, if anybody is still doing something like that, could they provide a full example URL? I was unable to reproduce on HTTP (failed in a different place), or FTP. |
joseph.hackman I don't think that the 63 character limit on a label is the problem specifically, merely it's application. The crux of my issue was that credentials passed with the url in a basic-authy fashion (as some services require) count against the label length. For example, this would trigger the error: h = "https://ablack:very_long_api_key_0123456789012345678901234567890123456789012345678901234567890123@www.example.com" Since the first label would be treated as: My specific issue goes away if any text up to / including an "@" in the first label section is not included in the label validation. I don't know off hand if that information is supposed to be included per the label in the DNS spec though. |
It seems reasonable to fail on hostnames that are too long -- but it feels like the weirdness is that it is categorized as a UnicodeError, and not as, say, a ValueError. Would a re-categorization as ValueError seem like a reasonable adjustment here? |
[I'm coming here from https://github.com/tornadoweb/tornado/pull/3010) UnicodeError is a subclass of ValueError, so I don't see what value that change would provide. The thing that's surprising to me is that it's not a I do think that in the special case of I'd like to at least see better documentation about what errors are possible from this family of functions. |
ablack: the basic auth username:password@ part of the string is not part of a hostname. What code are you seeing that is trying to send that to a name resolver rather than stripping the obviously private info up through the @ sign? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: