Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to re-export socket/ssl constants? #699

Closed
njsmith opened this issue Sep 29, 2018 · 20 comments
Closed

What's the best way to re-export socket/ssl constants? #699

njsmith opened this issue Sep 29, 2018 · 20 comments

Comments

@njsmith
Copy link
Member

njsmith commented Sep 29, 2018

The socket and ssl module both define a bunch of UPPERCASE constants, like AF_INET and CERT_REQUIRED. The trio.socket and trio.ssl modules want to re-export them. How should we do this?

This is challenging because the exact set of constants varies depending on platform (e.g. not every platform has socket.AF_UNIX) and Python version. It will probably keep changing in the future. We want our re-exports to be accurate despite this. We don't want to maintain a list of which constants are exported on which platform, and we don't want our re-exports to get stale and out-of-date when the stdlib changes.

This motivates our current approach, which is to dynamically query the list of UPPERCASE constants every time trio is imported, and then re-export whatever we find:

trio/trio/socket.py

Lines 50 to 55 in c6dcfdd

globals().update(
{
_name: getattr(_stdlib_socket, _name)
for _name in _stdlib_socket.__dict__ if _name.isupper()
}
)

This is simple, fast, accurate, it's great! Except... as we discovered in #542, there's another problem: while Python is allegedly a dynamic language, it turns out that static analysis is actually pretty handy and people get grumpy when their static analysis tools don't work. And static analysis tools have absolutely no idea what to make of the code above. So that means pylint and mypy raise errors on code that uses trio.socket.AF_INET, jedi and pycharm can't tab-complete trio.ssl.CERT_REQUIRED, and so on. How can we fix this?

Crude way: redundant re-exports

We could keep the current export loop for comprehensiveness, and also manually, redundantly re-export the most-used constants, just so static analysis tools can "see" that these are re-exported:

# trio/socket.py
globals().update({n: v for (n, v) in socket.__dict__.items() if n.isupper()})
from socket import AF_INET, AF_INET6, AF_UNIX, SOCK_STREAM, SOCK_DGRAM, IPPROTO_TCP

I guess the criteria for maintaining this list would "add stuff whenever someone complains"?

Possible refinement: wrap the manual re-exports in an if False:, so that we don't have to guard against ones that are missing?

Clever way: import * + ???

@jmfrank63 suggested another approach: we could from socket import *, and then manually delete all the lowercase exports that we don't want to re-export:

# trio/socket.py
from socket import *
del create_connection
del getfqdn
del SocketType
del setdefaulttimeout
...

from ._socket import (socket, getaddrinfo, ...)

The advantage of this is that if we do it right, then static analysis tools may be able to get a complete and accurate picture of our namespace. The downside is that we have to maintain a list of every non-uppercase symbol in socket, and make sure to keep it up-to-date across versions.

Questions:

  • Can actual existing static analysis tools understand from socket import *? This isn't a given... the stdlib socket module itself uses dynamic hacks to construct its export list:

    https://github.com/python/cpython/blob/4b430e5f6954ef4b248e95bfb4087635dcdefc6d/Lib/socket.py#L65

    We should check this before committing to anything.

  • How ugly is would trio/socket.py actually end up looking? I guess I can mostly visualize it, but it'd still be good to see what it actually looks like. (E.g. do we end up needing to wrap each del in a try/except?)

  • If we accidentally leave anything off of our big list of dels, then the failure mode is that Trio starts accidentally exporting some function, potentially a blocking one, and potentially we'll have trouble cleaning it up again later if people start using. So, we want to be very confident that our list is accurate, and stays accurate as the stdlib evolves. How can we do this? Some sort of test? What exactly would this look like?

Any other ideas?

We could... make static analysis tools smarter somehow? Sounds hard though, there are too many of them. Any other ideas?

@njsmith
Copy link
Member Author

njsmith commented Sep 29, 2018

CC @Zac-HD

@Zac-HD
Copy link
Member

Zac-HD commented Sep 29, 2018

I started wondering how Mypy copes with the stdlib socket module.

The import *-and-delete approach feels way too fragile to me, so my preferred option is actually a variation on the crude re-export:

# Do this for every known CONSTANT, so we minimise reliance on user reports.
# We can *write* this with a trivial script (and keep the script), 
# but the point is to have it on disk for static analysis tools.
with contextlib.suppress(ImportError):
    from socket import AF_UNIX
with contextlib.suppress(ImportError):
    from socket import AF_INET
with contextlib.suppress(ImportError):
    from socket import AF_INET6
...  # (all the others)

for n in dir(socket):
    if n == n.upper() and n not in dir():
        # Catches any leftovers, fails in our tests with -Werror, asks for reports
        warnings.warn(f'Please report that socket.{n} is not reexported by Trio')
        globals[n] = getattr(socket, n)

This is obviously rather verbose but it has the exact current behaviour at runtime, minimises risk of developer error, and should be handled well by all static analysis tools.

@jmfrank63
Copy link
Contributor

jmfrank63 commented Sep 29, 2018

My original idea of deleting the attributes and names we do not want seems now rather awkward to me.
I'd rather overwrite the stuff we do not want:

from socket import *

async def create_connection():
    raise NotImplementedError

for every attribute I do not want to expose. This way a user can just do

import trio.socket as socket

adapt all his method calls into coroutines and the trio socket module will be a drop in replacement for the standard library socket.

Should in the future we have to implement additional methods (for whatever reason), then this will not change anything user facing.

@jmfrank63
Copy link
Contributor

@Zac-HD
The import error suppression will probably work, but this is really very verbose, and certainly not DRY.
Also the test to find new symbols will then have to rely on a static list of similar size.
You will end up with about 400 lines of code and possibly the same amount of lines in the test.

I'd like to add the our current solution is a working one, it was just vscode that did not pick up the upper case symbols, so this might be a vscode problem. In the interactive shell configured with jedi the upper case names seemed to have worked.

@njsmith
Copy link
Member Author

njsmith commented Sep 29, 2018

In the interactive shell configured with jedi the upper case names seemed to have worked.

I think in this case jedi or the shell may be cheating and using dynamic runtime data.

@Zac-HD As a variation on that idea, what do you think of something like:

# trio/_generated/socket_constants.py
from socket import (
  AF_INET,
  AF_INET6,
  ...
)

# trio/socket.py
# Fake re-exports for static analysis
if TYPE_CHECKING:
    from ._generated.socket_constants import *
# Real re-exports for runtime usage
globals().update({name: value for (name, value) in socket.__dict__.items() if name.isupper()})

# test_socket.py
def test_socket_constant_generator():
    real_constants = {name for name in socket.__dict__ if name.isupper()}
    our_list = set(open(".../socket-constants.txt").split())
    assert our_list.issuperset(real_constants)

@njsmith
Copy link
Member Author

njsmith commented Sep 29, 2018

async def create_connection():
    raise NotImplementedError

Wouldn't it be kind of annoying for users that trio.socket.<TAB> suggests create_connection, but then if they try to actually use it then it errors out? If we're not actually going to provide a feature then surely it's better to del it rather than leave a fake replacement?

@Zac-HD
Copy link
Member

Zac-HD commented Sep 29, 2018

Variation

Yep, that looks good to me. We might need to define __all__ in _generated/socket_constants.py, but that's a matter for a quick experiment. I'd also add a test that running our module generator returns exactly the same module, to ensure that we don't get manual edits sneaking in and eventually overwritten.

If we're not actually going to provide a feature then surely it's better to del it rather than leave a fake replacement?

As a general principle of API design I don't like this at all, but in the specific case of sockets consistency is more important. Users expect to use hasattr to check if socket functions are usable, and we shouldn't break that without specific compelling reasons.

@jmfrank63
Copy link
Contributor

I surely will not oppose any of your suggestions, I just find these complicated structures a little bit of an overkill for a straight forward approach. NotImplemented could go into the doc string and would show up in code completion. Also the docs would give this information.
Besides why not just implement them all? Just wrap them to be async calls.

@njsmith
Copy link
Member Author

njsmith commented Sep 30, 2018

I just don't understand how implementing create_connection (and testing it, etc.), could be more straightforward than writing del create_connection.

@belm0
Copy link
Member

belm0 commented Sep 30, 2018

My vote is for generated, brute force re-export plus a test to catch missing items. I assume we're running tests on each of the relevant platforms.

@jmfrank63
Copy link
Contributor

jmfrank63 commented Sep 30, 2018

I just come from the perspective of someone porting his socket code to trio. Let's assume he somewhere uses socket.create_connection. I understand why there is now a function open_tcp_stream because it is superior, but why don't we just call open_tcp_stream form create_connection?
If someone wants to port his code and has used one of the convenience functions he has to replace the missing functions, which is very painful.
That's why I think wrapping is a neat solution.

@jmfrank63
Copy link
Contributor

I see generating seems to get the upper hand, but before the decision is made I want to throw in my 50 cents for the star import again.

The huge advantage of this one is it already takes care of any OS/Version dependencies, and static analysis tools pick it up. Also you don't have to deal with the zillions of uppercase names, except for the trio implemented ones.

The downside is it imports too much, and exposes everything including blocking functions.

This is the only thing we must take care of but this is not a lot of effort.

Deletion is one way.
There is the danger of missing out a deletion unless we write a very sophisticated test. Very likely we will end up with a static list of deletions which is not pretty.

Which brings up the second alternative I think we should think of (still my favorite)

Implementing (overwriting) all of the attributes.
This means of course rethinking why they have been omitted (they are obsolete or broken) but the decision to use such a function or attribute should be up to the implementer.

Of course gethostbyname is obsolete because getaddrinfo now does a better job.
So why not just implement gethostbyname by calling getaddrinfo giving backwards compatibility to code that still uses the old function?

The test is easy to write, just compare the trio.socket attributes with the socket attributes and fail if socket contains something for lowercase attributes that trio.socket is missing.

We have therefore 100% compatibility and any code using socket can easily be adapted by alias importing:

import trio.socket as socket

Why is this compatibility good?

For example open_tcp_stream is superior to create_connection no doubt. But the first one implements the happy eyeball algorithm which the latter doesn't. This however prevents wrapping it into a with statement.
So we could provide create_connection by calling open_tcp_stream and have the happy_eyeballs_delay set to infinity. Any legacy code using this function would instantly work.

This latter approach of implementing all the attributes basically would be somehow like a module inheritance.
It is easy to understand as opposed to the generated approach.
Things that are easy to understand are also easy to maintain and to use.

@njsmith
Copy link
Member Author

njsmith commented Oct 5, 2018

@jmfrank63 Let's split apart the discussion about how we export the socket module API from what it should be. These aren't really related, and generally in my experience, any time you can split apart a discussion into two pieces, you should :-). Regarding implementing the complete socket API: I see your point about porting existing socket code (though of course you need to go through and add async/await everywhere at the least). I'm not sure how common this will actually be, and am somewhat inclined to wait until it actually causes problems. (Most of the missing functions are actually broken; the exception is create_connection, but anyone who has the kind of simple problem where create_connection is useful should ideally stop using the socket module entirely and switch to our high-level stream API, that takes care of a bunch of tricky details. Of course we'd rather have imperfect ports than no ports, but we can wait to see if it's a problem before adding code.) But if you want to open an issue to talk about it in more detail, then please do. We can always adjust our export code to match whatever public API we decide on.

@Zac-HD I just had a random idea. I actually have no idea whether this would work, but maybe you do :-)

What would happen if we made an empty _socket_constants.py file, did from _socket_constants import *, and then made a _socket_constants.pyi that was just a copy of socket.pyi that we'd cleaned up by deleting all the non-constants?

I guess a downside would be that not every static analysis tool knows about .pyi files (in particular, I don't think jedi or pylint do – though apparently they have open issues: davidhalter/jedi#839, pylint-dev/astroid#254).

We'd still have to resync from upstream occasionally, and there wouldn't be any test to prod us to do so.

An advantage, particularly for trio.ssl, is that we wouldn't need hacks to deal with the python-version-dependent constants in the ssl module. (mypy knows that if you try to from ssl import PROTOCOL_TLS_CLIENT on python 3.5 then that's an error!)

@Zac-HD
Copy link
Member

Zac-HD commented Oct 5, 2018

Also unsure it would work, but purely for aesthetic reasons I'd prefer to write analysis-friendly code than pair opaque code with extra information. It just seems more robust, as well as nicer for new tools, but I'm well past my actual knowledge now and into taste and speculation.

@njsmith
Copy link
Member Author

njsmith commented Oct 6, 2018

You know, maybe the code generation and .pyi stuff and all is making it more complicated than it needs to be. Here's an even simpler version of the explicit import idea:

trio/socket.py:

# TYPE_CHECKING is False at runtime, True during static analysis
if TYPE_CHECKING:  # pragma: no cover
    # Actual code we maintain as needed (that just happens to never execute).
    # If we need to add Python version guards to some imports or something, we can do that
    from socket import (
        AF_INET,
        AF_INET6,
        ...
    )

# Real re-exports for runtime usage
globals().update({name: value for (name, value) in socket.__dict__.items() if name.isupper()})

...and then in test_exports.py, use our "standard" check that jedi/pylint can see all the exported names. That covers everything, while being DRY-compliant, no generated code, no .pyi files, etc., right?

@Zac-HD
Copy link
Member

Zac-HD commented Oct 6, 2018

It's an improvement on the status quo!

Let's just do that, and see how it works 😄

@njsmith
Copy link
Member Author

njsmith commented Oct 6, 2018

In theory I guess we should do from typing import TYPE_CHECKING as _TYPE_CHECKING. But it looks like mypy is not actually clever enough to track the aliasing here, and needs the variable to actually be called TYPE_CHECKING. Oh well. OTOH both mypy and pylint seem to be OK with any random variable called TYPE_CHECKING, they don't care whether it came from typing. (Mostly pylint doesn't care about if statements at all though, it's just in one check that it even bothers trying to figure out TYPE_CHECKING.) Oh well. We can always do from typing import TYPE_CHECKING and then del TYPE_CHECKING.


So it sounds like for each of the two major approaches, we're now pretty clear on the best way to do it.

Explicit, static-only imports

We maintain an explicit list of all the constants that can be exported under any configuration, with a guard so that they don't execute at runtime.

At runtime, we continue to do dynamic import of uppercase names.

To keep the static list of uppercase symbols up to date, we need a test that checks that static analysis tools like jedi/pylint agree with the runtime symbol list.

If the static list gets out of date, then the failure mode is that static analysis tools might miss some symbols, but runtime behavior will continue to be correct.

Tools like jedi/pylint should give accurate analyses, except that they might suggest/accept symbols that don't exist on the current platform, but do exist on some other platform (e.g., jedi on Windows will suggest Unix symbols). This seems fine.

For the socket module specifically, we might also want a test to check for new lowercase names appearing, just to give us a heads up in case we want to wrap them. If so, then we'll have to maintain a list of all the lowercase stuff in socket that we're intentionally not re-exporting. This is probably not very important though – the BSD socket API doesn't change much, and if someone needs something exotic then they can (a) submit a feature request like for any other exotic need, (b) use the stdlib socket module while they're waiting.

Import * plus static cleanups

We maintain an explicit list of all the lowercase names that we don't want to import, e.g. as lines like del <name> in trio/socket.py.

The same code executes at runtime and in static analysis.

To keep the static list of lowercase symbols up to date, we need a test that compares our runtime-exported names to a... separately-maintained list of lowercase names we don't want to import, I guess?

If the static list gets out of date, then the failure mode is that at runtime we start exporting symbols we didn't mean to.

Tools like jedi/pylint should give accurate analyses, to the same extent that they understand the socket and ssl modules. It's not clear how much that is, but from a quick check they do seem to know about lots of the constants, at least. Mypy at least should be OK thanks to typeshed. Which symbols they support will depend on how they understand the stdlib (e.g. mypy doesn't know which platforms have AF_UNIX). This seems fine.

If new socket module names appear, our test will notice, so we'll have a chance to think about whether to wrap them.

Verdict

Both of these sound doable, but the static import list seems to be preferred by several folks here, and the failure mode for import * makes me uncomfortable. So let's go with the static import list.

I think the todo items then are:

  • Modify trio/socket.py to add a block like:

    from typing import TYPE_CHECKING
    if TYPE_CHECKING:
        from socket import (
            ...
        )
    del TYPE_CHECKING
  • Extend the jedi/pylint tests in test_exports.py to apply to trio.socket as well. (They should also be applied to trio.testing and trio._core at some point, btw.)

@jmfrank63
Copy link
Contributor

Any suggestions on the priority as opposed to the other issues of #542?
I could start three modules in parallel of course ;-)

@njsmith
Copy link
Member Author

njsmith commented Oct 7, 2018

Up to you, I think :-)

@njsmith
Copy link
Member Author

njsmith commented Oct 27, 2018

Fixed in #730.

@njsmith njsmith closed this as completed Oct 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants