Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket module calls with long host names can fail with idna codec error #77139

Open
ablack mannequin opened this issue Feb 26, 2018 · 10 comments
Open

socket module calls with long host names can fail with idna codec error #77139

ablack mannequin opened this issue Feb 26, 2018 · 10 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@ablack
Copy link
Mannequin

ablack mannequin commented Feb 26, 2018

BPO 32958
Nosy @gpshead, @bitdancer, @bdarnell, @jhasapp, @agnosticdev, @alexmv

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-02-26.19:52:35.417>
labels = ['3.8', '3.7', 'library']
title = 'socket module calls with long host names can fail with idna codec error'
updated_at = <Date 2022-01-25.00:36:47.690>
user = 'https://bugs.python.org/ablack'

bugs.python.org fields:

activity = <Date 2022-01-25.00:36:47.690>
actor = 'gregory.p.smith'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-02-26.19:52:35.417>
creator = 'ablack'
dependencies = []
files = []
hgrepos = []
issue_num = 32958
keywords = []
message_count = 10.0
messages = ['312947', '313163', '313164', '313323', '372865', '373006', '374207', '391990', '393323', '411539']
nosy_count = 9.0
nosy_names = ['gregory.p.smith', 'r.david.murray', 'Ben.Darnell', 'joseph.hackman', 'agnosticdev', 'ablack', 'sdbowman', 'midopa', 'alexmv']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue32958'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

@ablack
Copy link
Mannequin Author

ablack mannequin commented Feb 26, 2018

While working on a custom conda channel with authentication, I ran into the following UnicodeError:

Traceback (most recent call last):
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/conda/core/repodata.py", line 402, in fetch_repodata_remote_request
    timeout=timeout)
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 499, in request
    prep.url, proxies, stream, verify, cert
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 672, in merge_environment_settings
    env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 692, in get_environ_proxies
    if should_bypass_proxies(url, no_proxy=no_proxy):
  File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 676, in should_bypass_proxies
    bypass = proxy_bypass(netloc)
  File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2612, in proxy_bypass
    return proxy_bypass_macosx_sysconf(host)
  File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2589, in proxy_bypass_macosx_sysconf
    return _proxy_bypass_macosx_sysconf(host, proxy_settings)
  File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2562, in _proxy_bypass_macosx_sysconf
    hostIP = socket.gethostbyname(hostonly)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

The error can be consistently reproduced when the first substring of the url hostname is greater than 64 characters long, as in "0123456789012345678901234567890123456789012345678901234567890123.example.com". This wouldn't be a problem, except that it doesn't seem to separate out credentials from the first substring of the hostname so the entire "[user]:[secret]@xxx" section must be less than 65 characters long. This is problematic for services that use longer API keys and expect their submission over basic auth.

@ablack ablack mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Feb 26, 2018
@ned-deily
Copy link
Member

Thanks for the report. The behavior you see can be further isolated to socket.gethostbyname:

>>> import socket
>>> h = "0123456789012345678901234567890123456789012345678901234567890123.example.com"
>>> socket.gethostbyname(h)
Traceback (most recent call last):
  File "/usr/lib/python3.6/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

Other socket module calls accepting host names fail similarly, such as getaddrinfo.

@ned-deily ned-deily added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 2, 2018
@ned-deily ned-deily changed the title Urllib proxy_bypass crashes for urls containing long basic auth strings socket module calls with long host names can fail with idna codec error Mar 2, 2018
@ned-deily ned-deily removed the type-crash A hard crash of the interpreter, possibly with a core dump label Mar 2, 2018
@ablack
Copy link
Mannequin Author

ablack mannequin commented Mar 2, 2018

Just to be clear, I don't know if the socket needs to support 64 character long host name sections, so here's an example url that is at the root of my problem that I'm pretty sure it should support:

>>> import socket
>>> h = "username:long_api_key0123456789012345678901234567890123456789@www.example.com"
>>> socket.gethostbyname(h)
Traceback (most recent call last):
  File "/Users/ablack/miniconda3/lib/python3.6/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

@agnosticdev
Copy link
Mannequin

agnosticdev mannequin commented Mar 6, 2018

Using Ubuntu 16.04 with the 3.6.0 tag I was also able to reproduce the same error reported:

import socket

h = "0123456789012345678901234567890123456789012345678901234567890123.example.com"
socket.gethostbyname(h)
Traceback (most recent call last):
  File "/home/agnosticdev/Documents/code/python/python-dev/cpython-3_6_0/Lib/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "host_test.py", line 8, in <module>
    socket.gethostbyname(h)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

It looks like the hostname being 64 characters long is the issue in that it cannot be encoded. Thus falling into the UnicodeError being raised in idna.py:
# ASCII name: fast path
labels = result.split(b'.')
for label in labels[:-1]:
if not (0 < len(label) < 64):
raise UnicodeError("label empty or too long")
if len(labels[-1]) >= 64:
raise UnicodeError("label too long")
return result, len(input)

I did some work on this to try and resolve this, but ultimately it was not worth committing so I wanted to report my findings.

@sdbowman
Copy link
Mannequin

sdbowman mannequin commented Jul 2, 2020

When will this issue be fixed? Thanks!

@jhasapp
Copy link
Mannequin

jhasapp mannequin commented Jul 5, 2020

According to the DNS standard, hostnames with more than 63 characters per label (the sections between .) are not allowed [https://tools.ietf.org/html/rfc1035#section-2.3.1].

That said, enforcing that at the codec level might be the wrong choice. I threw together a quick patch moving the limits up to 250, and nothing blew up. It's unclear what the general usefulness of such a change would be, since DNS servers probably couldn't handle those requests anyway.

As for the original issue, if anybody is still doing something like that, could they provide a full example URL? I was unable to reproduce on HTTP (failed in a different place), or FTP.

@ablack
Copy link
Mannequin Author

ablack mannequin commented Jul 24, 2020

joseph.hackman

I don't think that the 63 character limit on a label is the problem specifically, merely it's application.

The crux of my issue was that credentials passed with the url in a basic-authy fashion (as some services require) count against the label length. For example, this would trigger the error:

h = "https://ablack:very_long_api_key_0123456789012345678901234567890123456789012345678901234567890123@www.example.com"

Since the first label would be treated as:
"ablack:very_long_api_key_0123456789012345678901234567890123456789012345678901234567890123@www"

My specific issue goes away if any text up to / including an "@" in the first label section is not included in the label validation. I don't know off hand if that information is supposed to be included per the label in the DNS spec though.

@alexmv
Copy link
Mannequin

alexmv mannequin commented Apr 26, 2021

It seems reasonable to fail on hostnames that are too long -- but it feels like the weirdness is that it is categorized as a UnicodeError, and not as, say, a ValueError.

Would a re-categorization as ValueError seem like a reasonable adjustment here?

@bdarnell
Copy link
Mannequin

bdarnell mannequin commented May 9, 2021

[I'm coming here from https://github.com/tornadoweb/tornado/pull/3010)

UnicodeError is a subclass of ValueError, so I don't see what value that change would provide. The thing that's surprising to me is that it's not a socket.herror (or gaierror for socket.getaddrinfo). I guess the docs don't formally say that herror/gaierror is the only possible error from these functions, but gaierror was the only error I was catching so the unexpected UnicodeError escaped the layer that was intended to handle it.

I do think that in the special case of getaddrinfo with the AI_NUMERICHOST flag it should be handled differently: in that mode there is no network access necessary and it's reasonable to assume that the only possible error is a gaierror with EAI_NONAME.

I'd like to at least see better documentation about what errors are possible from this family of functions.

@gpshead
Copy link
Member

gpshead commented Jan 25, 2022

ablack: the basic auth username:password@ part of the string is not part of a hostname. What code are you seeing that is trying to send that to a name resolver rather than stripping the obviously private info up through the @ sign?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir
Projects
Development

No branches or pull requests

2 participants