Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have a conventional way to "spawn a process into a nursery"? #1104

Closed
njsmith opened this issue Jun 12, 2019 · 15 comments · Fixed by #1568
Closed

Should we have a conventional way to "spawn a process into a nursery"? #1104

njsmith opened this issue Jun 12, 2019 · 15 comments · Fixed by #1568

Comments

@njsmith
Copy link
Member

njsmith commented Jun 12, 2019

We have run_process, which is a convenient way to run a process like it's a regular subroutine, with error checking (check=), cancellation support, automatic waiting, etc. And we have trio.Process, which is a fully general way to spawn a process and then do whatever with it.

In the discussion in #872, @Badg brought up some dissatisfaction with both of these options. I think I mostly get the issue now, so I'll try to rephrase in my own words:

In some ways, when a task spawns a subprocess it's very similar to spawning a new task: the subprocess is a chunk of code that runs simultaneously with the Trio task, and they can potentially interact.

Trio has some strong and convenient conventions for spawning tasks:

  • trio always makes sure that both tasks are finished before exiting the nursery block
  • if the new task crashes, the exception propagates until caught, and automatically cancels sibling tasks
  • if the original task crashes, the new task is automatically cancelled too
  • if the surrounding context is cancelled, then this automatically cancels both tasks

It would be nice to be able to get this combination of features for subprocesses too. But we don't currently have any simple way to do that.

The closest thing we have is to use async with on a Process object. But that's stuck between two conflicting sets of conventions. The Process API mostly treats processes as independent objects, and our async with process mostly follows the general conventions for async with <resource>: it never cancels the body of the with (even if the process crashes). It doesn't care whether the body of the with block completed with an exception or not. And it doesn't do any particular checks on the subprocess's return code. This is also reasonable and what you'd expect for like, with file_obj. OTOH, the conventions above are specific to how Trio handles code, and really only show up with nurseries and nursery-like objects. Which a Process... kind of is, and kind of isn't.

I don't think it makes sense to try to wedge all those nursery semantics into async with process_obj. In particular, it would be very weird for async with process_obj to suddenly enable the equivalent of run_process's check=True. And then we'd need some way to allow it to be overridden with check=False on a case-by-case basis, and there's no obvious way to spell that. And if we use check=True and the subprocess fails and the body fails, do we raise a MultiError? In multierror v2, does that mean we need to wrap all outgoing exceptions in MultiError? It would involve a lot of weirdness.

Another idea that's been proposed is to extend run_process so that it can be used like:

process_obj = await nursery.start(run_process, ...)

This has several attractive aspects: run_process's job is to "domesticate" a subprocess so it acts like a subroutine – for example, the check= argument translates error-reporting from subprocess-style to subroutine-style. So here we're reusing that, plus a nursery's general ability to take any subroutine and push it off into a parallel task. You get nursery semantics because you used a nursery. Elegant!

It's also kinda weird. It's effectively a totally different 'mode' of using run_process: normally run_process treats the process_obj as hidden, and the return value as important; this flips that around completely, so you can mess with the process_obj, and never see the return value. Normally, the capture_* arguments are a major part of why people use run_process; here, they're completely useless, and would probably need to be forbidden. The use of start is sorta gratuitous: the actual process startup is synchronous, so you could just as well have a synchronous version. The start is just a trick to start a task and get a return value at the same time. (never mind, see #1109)

Are there other options to consider? Brainstorming:

async with trio.open_process(...) as process_obj:  # creates a nursery
    ...

process_obj = nursery.process_start_soon(...)
# or start_process_soon or start_soon_process?

async with trio.open_nursery() as nursery:
    await nursery.start(trio.process_as_task, ...)

async with trio.open_nursery() as nursery:
    process_obj = trio.start_process(nursery, ...)
@oremanj
Copy link
Member

oremanj commented Jun 12, 2019

I know I've proposed some other options in the past, but currently I like trio.start_process(nursery, ...) best out of the ones listed here. open_process() could be a convenient shorthand in some cases, but I think in general the version that takes a nursery argument is more useful because it lets you start a runtime-determined number of processes in the same scope. trio.process_as_task as its own function would I think be a little hard to disambiguate from trio.run_process from a teachability perspective, and nursery.process_start_soon makes it harder to write duck-nurseries.

@njsmith
Copy link
Member Author

njsmith commented Jun 13, 2019

#1109 affects this some – first because if we end up needing an open_process as the standard way to create a Process then we can't use it for this :-). And second because it means we might need to be using nursery.start instead of nursery.start_soon after all.

@belm0
Copy link
Member

belm0 commented Jul 4, 2019

I propose filling out docs for "Interacting with a process as it runs" with some examples to give users a sane starting point. This would have saved me many hours.

From the experience I had, the most important pieces are 1) don't defer parent exceptions while the subprocess is running, and 2) have the parent do something (e.g. exit) when the subprocess exits unexpectedly.

Roughly:

async with trio.Process(['my_executable']) as proc,
        trio.open_nursery() as nursery:
    try:
        @nursery.start_soon
        async def _subprocess_monitor():
            # do something (abort shown here) if subprocess exits unexpectedly
            if await proc.wait() != 0:
                logger.error("subprocess failed")
                sys.exit(1)
        # code for subprocess interaction goes here...
    except BaseException:
        # on local error, terminate subprocess to avoid it blocking task exit
        proc.terminate()

@njsmith
Copy link
Member Author

njsmith commented Aug 4, 2019

Another example of a real user being confused by async with Process: https://gitter.im/python-trio/general?at=5d468fac7c877461650bf003

@Badg
Copy link

Badg commented Aug 7, 2019

Sorry for being suuuuuper late to respond to the mention (I got a new puppy! She's adorable! It's a lot of work! etc etc). But I just wanted to hop in and say two quick things. First and foremost:

In some ways, when a task spawns a subprocess it's very similar to spawning a new task: the subprocess is a chunk of code that runs simultaneously with the Trio task, and they can potentially interact.

Trio has some strong and convenient conventions for spawning tasks... It would be nice to be able to get this combination of features for subprocesses too. But we don't currently have any simple way to do that.

is a solid encapsulation of the root of the discussion! 🎉

Second: I'm not sure that

this flips that around completely, so you can mess with the process_obj, and never see the return value

is entirely correct. I mean, you certainly can do this, but sometimes you want nursery semantics and return values, and then you end up wrapping things with something like this:

import functools
import trio


class BoundTrioPartial:

    def __init__(self, coro, *args, **kwargs):
        self.return_flag = trio.Event()

        @functools.wraps(coro)
        async def wrapper():
            self._return_value = await coro(*args, **kwargs)
            self.return_flag.set()

        self.wrapper = wrapper

    def __call__(self):
        return self.wrapper()

    async def get_return_value(self):
        await self.return_flag.wait()
        return self._return_value

to be used something like this:

async def outer():
    async with trio.open_nursery() as nursery:
        foo = BoundTrioPartial(inner, foo='foo')
        bar = BoundTrioPartial(inner, bar='bar')
        nursery.start_soon(trio.sleep, 1)
        nursery.start_soon(foo)
        nursery.start_soon(bar)
        foo_result = await foo.get_return_value()
        print(f'foo result: {foo_result}')
        bar_result = await bar.get_return_value()
        print(f'bar result: {bar_result}')


async def inner(*args, **kwargs):
    return (args, kwargs)

which, as you'd expect, works like this:

In [25]: trio.run(test)                                                                                                                                                               
foo result: ((), {'foo': 'foo'})
bar result: ((), {'bar': 'bar'})

if the answer to the question posed in the issue title is "Yes", (ie, if this gets implemented) then for the sake of consistency I think it would be nice for the solution to also support those kinds of patterns.

@njsmith
Copy link
Member Author

njsmith commented Aug 10, 2019

sometimes you want nursery semantics and return values

I see how that's something you want to be able to do in general, but is that something you want for run_process specifically?

The intermediate result from run_process is a Process, and the final result is a CompletedProcess. The difference between them is that a Process lets you read the child's stdout/stderr as it runs, while a CompletedProcess gives you the complete stdout/stderr at the end. It doesn't seem like you would want to do both at the same time. I guess in principle someone might want to read stdout incrementally but wait for the end to get stderr, but this seems like a pretty specialized problem to have.

@Badg
Copy link

Badg commented Aug 11, 2019

I see how that's something you want to be able to do in general, but is that something you want for run_process specifically?

That's a good point! Just having the Process seems like it would indeed be sufficient.

@smurfix
Copy link
Contributor

smurfix commented Aug 11, 2019

I guess in principle someone might want to read stdout incrementally but wait for the end to get stderr, but this seems like a pretty specialized problem to have.
Well … I can easily envision some interactive chitchat between two processes where I could care less about stderr – except when that process dies with a bad exit code. In that case I want its stderr, for reporting to the user.

Yes I know, I can always start a task to capture stderr separately, but having that built-in would still be helpful IMHO.

@MauiJerry
Copy link

MauiJerry commented Aug 11, 2019

This ties in nice with discourse #189 Trio with pexpect (gatttool). If I understand correctly, expect and pexpect (python version of expect) is a fancy way to have interactive conversation with a spawned process. You can send_cmd() and expect(keyStr) answers. the interprocess communications happen behind that expect(keyStr) that fills buffers Before, and After() the keyStr requested. (you can also run() a process which lets it run to completion and then returns result.)
pexpect is pure python, so a tpexpect might be a trio-fied version.
I'm currently using pexpect to communicate with a BLE Distance Sensor via gatttool (an abstraction over the BlueZ, etc details of BLE conversations). That process might timeout, get EOF or otherwise fail (closed pipes?). The send/expect api makes interrogating the device to get measurements easier (read: it works, dont need to become BlueZ API detail aware).
I havent gone spelunking into rationale, other uses/implementations, of expect but the pexpect link references it. Might be useful for Structured Concurrency providing inter-process conversations?

https://kite.com/python/docs/pexpect
https://en.wikipedia.org/wiki/Expect

@njsmith
Copy link
Member Author

njsmith commented Sep 24, 2019

Thinking about this more: the current Process context manager stuff seems to be a real magnet for bugs, and it seems like 99% of the cases where people want more than run_process, then this new process-in-nursery helper is what they actually want.

So maybe a good plan would be:

  • Add trio.start_process or whatever we settle on here
  • Deprecate/remove Process.aclose and the context manager equivalent
  • Move open_process into trio.hazmat
  • Maybe make Process.returncode a property that automatically calls Popen.poll and deprecate Process.poll – not really related to the rest of this stuff but it's a minor wart that we can fix

@njsmith
Copy link
Member Author

njsmith commented May 12, 2020

As part of this, one of the questions we have to answer is how to propagate cancellation into a subprocess for run_process and start_process. Right now, we have Process.aclose, whose semantics are somewhat arbitrary and awkward, and then run_process/start_process inherit their semantics from that. But if we're getting rid of Process.aclose, then that gives us more flexibility, and maybe we should take advantage of that to choose something more suited to run_process/start_process.

One idea: let the user provide an arbitrary async function that's invoked if the subprocess gets cancelled. The semantics would be:

  • The function takes one argument, which is the trio.Process to cancel
  • It's an async function, so it runs in its own task
  • This task is shielded from the external cancellation
  • But if/when the process exits, then this task is automatically cancelled

Then our default implementation could be something like:

async def default_process_cancel(process):
    try:
        process.send_signal(signal.SIGTERM)
        await trio.sleep(5)
        warnings.warn(RuntimeWarning("process ignored SIGTERM for 5 seconds, escalating to SIGKILL"))
        process.send_signal(signal.SIGKILL)
    except OSError:
        warnings.warn(RuntimeWarning("can't kill process!"))

Usually, the SIGTERM will be sufficient and the child will exit, so default_process_cancel will get cancelled during the sleep.

And if users wanted to tweak this somehow (close stdin? change the sleep value? ...), then that's easy to do – I think this formalism makes it quite easy to implement complex cancellation logic.

#1209 also has an example where a user wanted a custom way to cancel subprocesses.

@njsmith
Copy link
Member Author

njsmith commented May 18, 2020

The customizable cancellation behavior was added in #1525.

Over in #1521, we're talking about adding a concept of a "service task". The reason this is relevant here, is that if we decide to go with start_process(nursery, ...), then how do you define a service process? Maybe we should reconsider nursery.start(run_process, ...)? Or if we make "service-ness" a flag on the nursery rather than the task, then this isn't an issue.... but then there is still the question of how to pass other arguments to nursery.start, like name=. Or maybe name= is an argument for start_process, since if start_process is the one calling start then it can automatically pass in a nice default for name=? (Well, I guess technically it would be possible for run_process to mutate current_task().name... or maybe we should teach nursery.start to check for a __trio_custom_task_name_maker__ method on the passed-in callable, as a general mechanism for functions to export nice names? Like maybe serve_tcp would say which host/port it's serving, so when you look at the task tree you can tell which is which?)

Cross-ref: #1521 (comment)

@njsmith
Copy link
Member Author

njsmith commented May 18, 2020

Quoting from upthread, the arguments against giving run_process a magical switch between modes:

It's also kinda weird. It's effectively a totally different 'mode' of using run_process: normally run_process treats the process_obj as hidden, and the return value as important; this flips that around completely, so you can mess with the process_obj, and never see the return value.

Yep, it is kinda weird.

Normally, the capture_* arguments are a major part of why people use run_process; here, they're completely useless, and would probably need to be forbidden.

On further thought, I'm not convinced this is true. If a process exits with a non-zero result, and check=True (the default), then you get a CalledProcessError that includes any captured output. So even if you're running the process in a background task, you might well want to use capture_* to capture the output so you can do something with it in case it fails.

The use of start is sorta gratuitous: the actual process startup is synchronous, so you could just as well have a synchronous version. The start is just a trick to start a task and get a return value at the same time.

This was already defunct, because of #1109.

So really the only issue here is how "natural" it feels, and I dunno... making start(run_process, ...) a little magical is, well, magical, but maybe it's the good kind of magic, where it just does what you want.

@njsmith
Copy link
Member Author

njsmith commented May 18, 2020

...Though in an interesting example of how every design question is related to every other design question: if we do go with nursery.start(run_process, ...), then I feel like that will be a pretty compelling argument in favor of making start and friends pass-through kwargs (#470), even if it requires some awkwardness.

@smurfix
Copy link
Contributor

smurfix commented May 18, 2020

To functools.partialize every random call to nursery.start which I might want to instrument or forward or whatever is some level of awkwardness, too … so frankly I'm somewhat strongly in favor of forwarding random kwargs wherever we accept+forward random args.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants