Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subprocess support: run_process() #872

Merged
merged 18 commits into from
Jun 11, 2019
Merged

Conversation

oremanj
Copy link
Member

@oremanj oremanj commented Jan 23, 2019

Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.

Closes #822.

Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.
@oremanj oremanj requested a review from njsmith January 23, 2019 01:59
@codecov
Copy link

codecov bot commented Jan 23, 2019

Codecov Report

Merging #872 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #872      +/-   ##
==========================================
- Coverage   99.54%   99.52%   -0.02%     
==========================================
  Files         102      102              
  Lines       12392    12300      -92     
  Branches      935      919      -16     
==========================================
- Hits        12335    12242      -93     
  Misses         36       36              
- Partials       21       22       +1
Impacted Files Coverage Δ
trio/_subprocess.py 100% <100%> (ø) ⬆️
trio/tests/test_subprocess.py 100% <100%> (ø) ⬆️
trio/__init__.py 100% <100%> (ø) ⬆️
trio/_core/_multierror.py 99.37% <0%> (-0.63%) ⬇️
trio/_core/_ki.py 98.36% <0%> (-0.06%) ⬇️
trio/_core/_run.py 99.7% <0%> (-0.03%) ⬇️
trio/_unix_pipes.py 100% <0%> (ø) ⬆️
trio/_highlevel_socket.py 100% <0%> (ø) ⬆️
trio/tests/test_highlevel_open_tcp_stream.py 100% <0%> (ø) ⬆️
trio/_threads.py 100% <0%> (ø) ⬆️
... and 38 more

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving this forward! Here are some comments focused on the high-level design and docs. (I haven't done a close-reading of the code yet, though nothing jumped out at me either.)


.. automethod:: terminate

.. automethod:: send_signal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does .. autoclass: :members: not pick these up for some reason...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be able to control the order, and I couldn't get them to go in non-alphabetical order using :members:. (This despite our docs/source/conf.py having autodoc_member_order = "bysource", and also trying an explicit :member-order: bysource.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that's python-trio/sphinxcontrib-trio#13

This was already fixed back in August, we just never released it... whoops. I just released that, so hopefully bysource works now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

docs/source/reference-io.rst Outdated Show resolved Hide resolved
docs/source/reference-io.rst Outdated Show resolved Hide resolved
more specific error check if you're planning on sending any.
**options: :func:`run_process` also accepts any :ref:`general subprocess
options <subprocess-options>` and passes them on to the
:class:`~trio.Process` constructor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble thinking of cases where getting access to the Process object is the natural way to accomplish something... if someone's using run_process to pause a task to run a process, then it feels like the natural way to cancel that is to, y'know, cancel it :-). And TOOWTDI?

There's a long tradition in other libraries having lots of kinds of "thing you run", and giving each their own ad hoc cancellation API, that I've tried to avoid succumbing to...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine wanting to send SIGSTOP/SIGCONT, or to send some graceful shutdown signal (since we dropped the explicit support for that), or to have access to the PID for various reasons. I guess it's true that many of these wouldn't overlap with output capturing, and with no capturing and check=False run_process is basically just async with Process(...) as proc: pass, so maybe it's more of an attractive nuisance than it is a useful tool? Happy to remove the start() support if you think it's not worth it after reading this explanation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's drop the task_status for now, and we can always add it back later when we have more experience.

so a Ctrl+C will be delivered to both the child process and the parent Trio
process simultaneously. If that's not what you want, you'll need to delve
into platform-specific APIs like (on UNIX) :func:`os.setpgrp` and
:func:`os.tcsetpgrp`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is this up here, instead of in the run_process docstring? Output capturing semantics seems like a good thing to describe in the docstring :-).

  2. I'm surprised at the switch to capturing by default. It doesn't seem obviously wrong to me, but it doesn't seem obviously right, either, and it's a pretty major change from subprocess.run so it needs some kind of rationale.

  3. The mention of setpgrp and tcsetpgrp seems to come out of no-where here? I'm not sure if we should mention them at all, and if we do then probably it shouldn't be tacked onto the end of an unrelated paragraph :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale for output capturing by default is:

  • If you're writing a concurrent app, you might want to run multiple processes at once. That's maybe even more likely than that you'd be careful not to run multiple processes at once. If you run multiple processes at once, it's not really safe to not isolate their I/O from each other.
  • If we have a flag for capturing output, people might use it but forget to pass input=b"" to a process that they don't expect will read any input. If that process then does try to read input, it looks like a hang, because whatever prompt it printed before reading from stdin was captured and thus invisible to the user. So, capturing by default seems less error-prone than providing different flags for capturing output and for providing input.

Originally I had run_process and delegate_to_process as two separate functions, where run_process used isolated I/O and delegate_to_process did no redirections. I removed that because it seemed like too much API surface area given your other concerns. I could bring it back, or could add a passthrough option to run_process, where passthrough=True means don't do redirections and passthrough=False means send input and capture output. Or whatever name (and default) you think would be clearest? I guess a separate delegate_to_process function has the advantage that it gives us a place to hang additional cleverness that might be developed in the future for dealing with interactive subprocesses "properly" (there's a lot of complication in the TTY job control handling). But maybe that's premature and would be better handled in an outside library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the passthrough option; let me know how you think it reads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passthrough option is exactly the same as the stdlib's capture_output, but backwards, right? If we do do this, then making the default capture_output=True seems strictly better than making it passthrough=False :-).

I think I might still prefer capture_stdout=True, capture_stderr=False, but before we dive into that can you clarify the handling of input? In the stdlib run, AFAICT input=None (the default) means we let the child inherit the parent's stdin, and you have to pass input=b"" if you want to give it an empty pipe. Are we matching that, or are you thinking that passthrough would affect this, or...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passthrough=True gives the stdlib defaults. passthrough=False is equivalent to input=b"" if no input is specified, plus capture_output=True. I explained above in this thread why I think it's error-prone to have different isolation defaults for input vs output.

My preference would be to have one flag that controls "are the standard streams for this process basically isolated or are they basically inherited from the parent Trio process", and then people can modify the treatment of individual streams with the stdin/out/err options if they don't want all three streams treated the same way. If the standard streams are isolated and you didn't provide any input, you probably want the subprocess's input to come from /dev/null. It seems confusing to me to distinguish input=None from input=b"" for that purpose.

Copy link

@Badg Badg Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process high-level API.

I think at an intuitive level, I expect async def run_process to be atomic -- it's responsible for starting, managing, and closing the process; when it returns, the process is definitely finished, and before you call it, the process definitely hasn't started (plus all of the other usual trio semantics).

So then if you want to combine that high-level API with eager/incremental processing of output/input, then you're (by definition) forced to preallocate your stdin producers and stdout/err consumers using a nursery, which you then need to coordinate with the run_process API somehow so that they get hooked up correctly. That, ultimately, is the API problem my wrapper is trying to solve.

Also partly relevant to the discussion in #959, I think my choice for a channel over a stream, even though the stream is quite literally the thing returned by the trio Process, is the result of two things (listed in order of importance):

  1. at least according to the docs, there's nothing suggesting I can pass an existing stream object to the trio.Process constructor, which means I'd have no way of preallocating the stream
  2. it seems much, much more natural (and easier!) to use channels for this, because of the send_channel, receive_channel = trio.open_memory_channel() function, which immediately allocates everything you need to do this coordination (and close it down easily)

Hopefully that explanation of my train of thought makes sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Badg

at least according to the docs, there's nothing suggesting I can pass an existing stream object to the trio.Process constructor, which means I'd have no way of preallocating the stream

That's correct. It's possible in some exotic cases, but not in general.

Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process high-level API.

Why do you prefer pre-allocating consumers, though? It seems like this is just as easy?

async with trio.Process(...) as p:
    async with trio.open_nursery() as nursery:
        ...

Or if not, why not? (I guess two possibilities that come to mind: that doesn't do check and it doesn't kill the process on error. I sort of wonder if Process.__aexit__ should kill the process if there's an exception, by analogy with nurseries cancelling siblings on error.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njsmith I don't necessarily prefer preallocating consumers. The goal with my wrapper was to provide a high-level API that does check, handles await process.wait(), kills the process on error/cancellation/etc, all within a single await run_process call. As far as I'm aware, the only possible way to combine those semantics with incremental processing (of either input or output) is to preallocate, so that's what the API ends up exposing. Does that make sense?

Copy link
Member

@njsmith njsmith Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, if the requirement is that it be a single await my_run_process(...), then you have to preallocate. But what I'm still not sure of is: why is it important that it be a single await my_run_process(...), as compared to a single async with my_run_process(...)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, at the end of the day it's just my own personal API preference, so it deserves a heaping mound of salt. But I think my preference for an atomic my_run_process(...) over an async context manager, is that the context manager introduces a second concurrency mechanism to trio -- it's behaving both like a subprocess spawner, and a nursery, and my personal preference would be to keep those two separate. It was also my experience that the context manager resulted in more boilerplate, but I think that's the weaker point so I'm not sure it's worth exploring very far.

Like I said though, I'm just one person, and I always reserve the right to be wrong haha. Just thought I'd add some thoughts into the mix!

@python-trio python-trio deleted a comment from codecov bot Jan 28, 2019
@oremanj
Copy link
Member Author

oremanj commented Feb 3, 2019

@njsmith - I think I've addressed all the outstanding comments here; could you take another look?

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some line comments here; the big question is about passthrough, and github's UI doesn't seem to let me respond to that thread in this review interface, so I'll post separately.

docs/source/reference-io.rst Outdated Show resolved Hide resolved
docs/source/reference-io.rst Outdated Show resolved Hide resolved
docs/source/reference-io.rst Outdated Show resolved Hide resolved

.. automethod:: terminate

.. automethod:: send_signal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

more specific error check if you're planning on sending any.
**options: :func:`run_process` also accepts any :ref:`general subprocess
options <subprocess-options>` and passes them on to the
:class:`~trio.Process` constructor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's drop the task_status for now, and we can always add it back later when we have more experience.

@oremanj
Copy link
Member Author

oremanj commented Feb 6, 2019

For some reason github isn't letting me reply to or resolve your last two line comments inline

I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

I just upgraded my sphinx and sphinxcontrib-trio (the latter to 1.0.2, which supposedly includes the fix?) and it still seems to enforce alphabetical ordering.

I think let's drop the task_status for now, and we can always add it back later when we have more experience.

Ok!

@oremanj
Copy link
Member Author

oremanj commented Feb 14, 2019

@njsmith: heads up that this PR is currently awaiting a response to the proposed redirection semantics in #872 (comment). No rush if you're busy, just wanted to poke since I find it easy to accidentally lose comments in github threads and I figured there was a chance that that had happened here.

@oremanj
Copy link
Member Author

oremanj commented Feb 23, 2019

Sorry for the delay in responding - how does this iteration look?

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still on the fence about capture. Do we really need it?

And if we do need it, I think capture_both is a much better name... I worry that people will write or read capture and accidentally swallow stderr again.

trio/_subprocess.py Outdated Show resolved Hide resolved
trio/_subprocess.py Outdated Show resolved Hide resolved
@smurfix
Copy link
Contributor

smurfix commented Mar 5, 2019

-1 on capture. +0 on capture_both; there's exactly one reason I can think of why you would do that, and that's to grab the output of "/usr/bin/whatever --help", because you never know whether it's going to be printer to stdout or stderr – but you can handle that case just as easily by capturing stdout and stderr separately.

@njsmith
Copy link
Member

njsmith commented Mar 6, 2019

grab the output of "/usr/bin/whatever --help", because you never know whether it's going to be printer to stdout or stderr – but you can handle that case just as easily by capturing stdout and stderr separately.

Arguably the best way to do that is capture_stdout=True, stderr=STDOUT. (Actually this is what you want in a lot of cases – it loses the distinction between stdout and stderr, but in exchange you get them interleaved properly in time.)

@oremanj
Copy link
Member Author

oremanj commented Mar 8, 2019

All right, I'll concede that my bias towards output capturing seems to be in the minority around here. :-) Have removed capture entirely and addressed the other comments.

@dhirschfeld
Copy link
Member

in exchange you get them interleaved properly in time

For me this is critical and ~always what I do because it's the easiest way to debug when things go wrong. I've always found the stderr on it's own with no surrounding context to be pretty useless for debugging purposes.

Is there a way to do this with the current design? From a quick look I can't see how?

@oremanj
Copy link
Member Author

oremanj commented Mar 9, 2019

Is there a way to do this with the current design? From a quick look I can't see how?

run_process() accepts all the Popen-style redirection options for stdin/stdout/stderr as alternatives to the higher-level stdin=b"stuff" / capture_stdout / capture_stderr. So you would say await run_process(..., capture_stdout=True, stderr=subprocess.STDOUT), and look at the .stdout attribute of the returned object.

@smurfix
Copy link
Contributor

smurfix commented Mar 9, 2019

in exchange you get them interleaved properly in time

Well, I would – if both stdout and stderr were either line buffered or unbuffered. Unfortunately, that's rarely the case; typically stderr is either line buffered or unbuffered, no matter where it's going, while stdout's buffer is some-power-of-two when it's not going to a terminal.

@njsmith
Copy link
Member

njsmith commented Mar 11, 2019

Well, I would – if both stdout and stderr were either line buffered or unbuffered. Unfortunately, that's rarely the case; typically stderr is either line buffered or unbuffered, no matter where it's going, while stdout's buffer is some-power-of-two when it's not going to a terminal.

Off-topic but sometimes useful hack: If your subprocess is python, then you can pass -u or set the envvar PYTHONUNBUFFERED=1 to force stdout and stderr into line-buffered mode.

The more general solution would be to allocate a tty instead of a pipe. That has some extra complications and I'm not sure if convenience functions for it are in-scope for trio or not, but if you want to open an issue to discuss then feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trio.run_process
6 participants