Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_valid_ip: Do not raise exceptions on too-long input. #3010

Merged
merged 1 commit into from
May 15, 2021

Conversation

alexmv
Copy link
Contributor

@alexmv alexmv commented Apr 7, 2021

Calling socket.getaddrinfo with a potential hostname attempts to
encode the "hostname" with the idna encoding. Per RFC, each part of
the hostname cannot be longer than 63 characters; and getaddrinfo
enforces this by raising a UnicodeError if the length is too long.

Catch these UnicodeErrors and return false for too-long names, rather
than raising an exception.

@bdarnell
Copy link
Member

Thanks for the contribution! This looks good to me, but I'd like to have a comment in the code explaining why UnicodeError might be thrown.

I also think this may be a bug in the standard library - I don't think it should be trying to do the idna encoding if AI_NUMERICHOST is passed.

@ploxiln
Copy link
Contributor

ploxiln commented Apr 10, 2021

hmm yeah weird:

>>> socket.getaddrinfo("a" * 32, 0, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_NUMERICHOST)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

>>> socket.getaddrinfo("a" * 70, 0, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_NUMERICHOST)
Traceback (most recent call last):
  File "/usr/lib/python3.7/encodings/idna.py", line 167, in encode
    raise UnicodeError("label too long")
UnicodeError: label too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

@alexmv
Copy link
Contributor Author

alexmv commented Apr 26, 2021

Yeah, I think I agree that this is really a python bug -- it looks like this was already filed as https://bugs.python.org/issue32958. Up to you if it's worth papering over in Tornado!

Happy to fix this up with a longer comment pointing to the core Python bug if you're interested in taking it.

@bdarnell
Copy link
Member

bdarnell commented May 9, 2021

I think this is a good change since the upstream fix will be rolled out slowly. As I said, it just needs a comment about what's going on, and perhaps a test case.

is_valid_ip calls `socket.getaddrinfo` with `socket.AI_NUMERICHOST` on
the potential "ip"; even though IP addresses are not hostnames and do
not use the `idna` encoding, `socket.getaddrinfo` will raise
UnicodeError if the potential "ip" is longer than 63 characters long,
the RFC-mandated max hostname length.

Catch these UnicodeErrors and return false for too-long IPs, rather
than raising an exception.
@alexmv
Copy link
Contributor Author

alexmv commented May 10, 2021

I already had a simple test case (that too-long values don't raise, but return false) but I've added a comment, and fixed up the commit message to also point to the Python bug report.

Happy to add more tests, but AFAICT this gives coverage of the relevant codepath.

@bdarnell
Copy link
Member

Thanks!

@bdarnell bdarnell merged commit da4130b into tornadoweb:master May 15, 2021
@alexmv alexmv deleted the is_valid_ip branch November 12, 2021 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants