Subprocess support: run_process() #872

oremanj · 2019-01-23T01:59:06Z

Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.

Closes #822.

Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.

codecov · 2019-01-23T02:01:27Z

Codecov Report

Merging #872 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #872      +/-   ##
==========================================
- Coverage   99.54%   99.52%   -0.02%     
==========================================
  Files         102      102              
  Lines       12392    12300      -92     
  Branches      935      919      -16     
==========================================
- Hits        12335    12242      -93     
  Misses         36       36              
- Partials       21       22       +1

Impacted Files	Coverage Δ
trio/_subprocess.py	`100% <100%> (ø)`	⬆️
trio/tests/test_subprocess.py	`100% <100%> (ø)`	⬆️
trio/__init__.py	`100% <100%> (ø)`	⬆️
trio/_core/_multierror.py	`99.37% <0%> (-0.63%)`	⬇️
trio/_core/_ki.py	`98.36% <0%> (-0.06%)`	⬇️
trio/_core/_run.py	`99.7% <0%> (-0.03%)`	⬇️
trio/_unix_pipes.py	`100% <0%> (ø)`	⬆️
trio/_highlevel_socket.py	`100% <0%> (ø)`	⬆️
trio/tests/test_highlevel_open_tcp_stream.py	`100% <0%> (ø)`	⬆️
trio/_threads.py	`100% <0%> (ø)`	⬆️
... and 38 more

njsmith

Thanks for moving this forward! Here are some comments focused on the high-level design and docs. (I haven't done a close-reading of the code yet, though nothing jumped out at me either.)

njsmith · 2019-01-27T09:51:09Z

docs/source/reference-io.rst

+
+   .. automethod:: terminate
+
+   .. automethod:: send_signal


Does .. autoclass: :members: not pick these up for some reason...?

I wanted to be able to control the order, and I couldn't get them to go in non-alphabetical order using :members:. (This despite our docs/source/conf.py having autodoc_member_order = "bysource", and also trying an explicit :member-order: bysource.)

Ah, yeah, that's python-trio/sphinxcontrib-trio#13

This was already fixed back in August, we just never released it... whoops. I just released that, so hopefully bysource works now.

I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

docs/source/reference-io.rst

njsmith · 2019-01-27T10:05:08Z

trio/_subprocess.py

+          more specific error check if you're planning on sending any.
+      **options: :func:`run_process` also accepts any :ref:`general subprocess
+          options <subprocess-options>` and passes them on to the
+          :class:`~trio.Process` constructor.


I'm having trouble thinking of cases where getting access to the Process object is the natural way to accomplish something... if someone's using run_process to pause a task to run a process, then it feels like the natural way to cancel that is to, y'know, cancel it :-). And TOOWTDI?

There's a long tradition in other libraries having lots of kinds of "thing you run", and giving each their own ad hoc cancellation API, that I've tried to avoid succumbing to...

I could imagine wanting to send SIGSTOP/SIGCONT, or to send some graceful shutdown signal (since we dropped the explicit support for that), or to have access to the PID for various reasons. I guess it's true that many of these wouldn't overlap with output capturing, and with no capturing and check=False run_process is basically just async with Process(...) as proc: pass, so maybe it's more of an attractive nuisance than it is a useful tool? Happy to remove the start() support if you think it's not worth it after reading this explanation.

I think let's drop the task_status for now, and we can always add it back later when we have more experience.

njsmith · 2019-01-27T10:09:43Z

docs/source/reference-io.rst

+so a Ctrl+C will be delivered to both the child process and the parent Trio
+process simultaneously. If that's not what you want, you'll need to delve
+into platform-specific APIs like (on UNIX) :func:`os.setpgrp` and
+:func:`os.tcsetpgrp`.


Why is this up here, instead of in the run_process docstring? Output capturing semantics seems like a good thing to describe in the docstring :-).

I'm surprised at the switch to capturing by default. It doesn't seem obviously wrong to me, but it doesn't seem obviously right, either, and it's a pretty major change from subprocess.run so it needs some kind of rationale.

The mention of setpgrp and tcsetpgrp seems to come out of no-where here? I'm not sure if we should mention them at all, and if we do then probably it shouldn't be tacked onto the end of an unrelated paragraph :-)

My rationale for output capturing by default is:

If you're writing a concurrent app, you might want to run multiple processes at once. That's maybe even more likely than that you'd be careful not to run multiple processes at once. If you run multiple processes at once, it's not really safe to not isolate their I/O from each other.

If we have a flag for capturing output, people might use it but forget to pass input=b"" to a process that they don't expect will read any input. If that process then does try to read input, it looks like a hang, because whatever prompt it printed before reading from stdin was captured and thus invisible to the user. So, capturing by default seems less error-prone than providing different flags for capturing output and for providing input.

Originally I had run_process and delegate_to_process as two separate functions, where run_process used isolated I/O and delegate_to_process did no redirections. I removed that because it seemed like too much API surface area given your other concerns. I could bring it back, or could add a passthrough option to run_process, where passthrough=True means don't do redirections and passthrough=False means send input and capture output. Or whatever name (and default) you think would be clearest? I guess a separate delegate_to_process function has the advantage that it gives us a place to hang additional cleverness that might be developed in the future for dealing with interactive subprocesses "properly" (there's a lot of complication in the TTY job control handling). But maybe that's premature and would be better handled in an outside library.

I implemented the passthrough option; let me know how you think it reads?

The passthrough option is exactly the same as the stdlib's capture_output, but backwards, right? If we do do this, then making the default capture_output=True seems strictly better than making it passthrough=False :-).

I think I might still prefer capture_stdout=True, capture_stderr=False, but before we dive into that can you clarify the handling of input? In the stdlib run, AFAICT input=None (the default) means we let the child inherit the parent's stdin, and you have to pass input=b"" if you want to give it an empty pipe. Are we matching that, or are you thinking that passthrough would affect this, or...?

passthrough=True gives the stdlib defaults. passthrough=False is equivalent to input=b"" if no input is specified, plus capture_output=True. I explained above in this thread why I think it's error-prone to have different isolation defaults for input vs output.

My preference would be to have one flag that controls "are the standard streams for this process basically isolated or are they basically inherited from the parent Trio process", and then people can modify the treatment of individual streams with the stdin/out/err options if they don't want all three streams treated the same way. If the standard streams are isolated and you didn't provide any input, you probably want the subprocess's input to come from /dev/null. It seems confusing to me to distinguish input=None from input=b"" for that purpose.

Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process high-level API.

I think at an intuitive level, I expect async def run_process to be atomic -- it's responsible for starting, managing, and closing the process; when it returns, the process is definitely finished, and before you call it, the process definitely hasn't started (plus all of the other usual trio semantics).

So then if you want to combine that high-level API with eager/incremental processing of output/input, then you're (by definition) forced to preallocate your stdin producers and stdout/err consumers using a nursery, which you then need to coordinate with the run_process API somehow so that they get hooked up correctly. That, ultimately, is the API problem my wrapper is trying to solve.

Also partly relevant to the discussion in #959, I think my choice for a channel over a stream, even though the stream is quite literally the thing returned by the trio Process, is the result of two things (listed in order of importance):

at least according to the docs, there's nothing suggesting I can pass an existing stream object to the trio.Process constructor, which means I'd have no way of preallocating the stream

it seems much, much more natural (and easier!) to use channels for this, because of the send_channel, receive_channel = trio.open_memory_channel() function, which immediately allocates everything you need to do this coordination (and close it down easily)

Hopefully that explanation of my train of thought makes sense!

@Badg

at least according to the docs, there's nothing suggesting I can pass an existing stream object to the trio.Process constructor, which means I'd have no way of preallocating the stream

That's correct. It's possible in some exotic cases, but not in general.

Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process high-level API.

Why do you prefer pre-allocating consumers, though? It seems like this is just as easy?

async with trio.Process(...) as p: async with trio.open_nursery() as nursery: ...

Or if not, why not? (I guess two possibilities that come to mind: that doesn't do check and it doesn't kill the process on error. I sort of wonder if Process.__aexit__ should kill the process if there's an exception, by analogy with nurseries cancelling siblings on error.)

@njsmith I don't necessarily prefer preallocating consumers. The goal with my wrapper was to provide a high-level API that does check, handles await process.wait(), kills the process on error/cancellation/etc, all within a single await run_process call. As far as I'm aware, the only possible way to combine those semantics with incremental processing (of either input or output) is to preallocate, so that's what the API ends up exposing. Does that make sense?

That's true, if the requirement is that it be a single await my_run_process(...), then you have to preallocate. But what I'm still not sure of is: why is it important that it be a single await my_run_process(...), as compared to a single async with my_run_process(...)?

I mean, at the end of the day it's just my own personal API preference, so it deserves a heaping mound of salt. But I think my preference for an atomic my_run_process(...) over an async context manager, is that the context manager introduces a second concurrency mechanism to trio -- it's behaving both like a subprocess spawner, and a nursery, and my personal preference would be to keep those two separate. It was also my experience that the context manager resulted in more boilerplate, but I think that's the weaker point so I'm not sure it's worth exploring very far.

Like I said though, I'm just one person, and I always reserve the right to be wrong haha. Just thought I'd add some thoughts into the mix!

oremanj · 2019-02-03T02:11:50Z

@njsmith - I think I've addressed all the outstanding comments here; could you take another look?

njsmith

Some line comments here; the big question is about passthrough, and github's UI doesn't seem to let me respond to that thread in this review interface, so I'll post separately.

docs/source/reference-io.rst

njsmith · 2019-02-06T10:14:46Z

docs/source/reference-io.rst

+
+   .. automethod:: terminate
+
+   .. automethod:: send_signal


I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

njsmith · 2019-02-06T10:46:22Z

trio/_subprocess.py

+          more specific error check if you're planning on sending any.
+      **options: :func:`run_process` also accepts any :ref:`general subprocess
+          options <subprocess-options>` and passes them on to the
+          :class:`~trio.Process` constructor.


I think let's drop the task_status for now, and we can always add it back later when we have more experience.

oremanj · 2019-02-06T20:15:57Z

For some reason github isn't letting me reply to or resolve your last two line comments inline

I think you can drop these now; "bysource" should work now. (And if it doesn't I want to know ;-))

I just upgraded my sphinx and sphinxcontrib-trio (the latter to 1.0.2, which supposedly includes the fix?) and it still seems to enforce alphabetical ordering.

I think let's drop the task_status for now, and we can always add it back later when we have more experience.

Ok!

…_run

oremanj · 2019-02-14T21:20:04Z

@njsmith: heads up that this PR is currently awaiting a response to the proposed redirection semantics in #872 (comment). No rush if you're busy, just wanted to poke since I find it easy to accidentally lose comments in github threads and I figured there was a chance that that had happened here.

…_run

oremanj · 2019-02-23T08:38:47Z

Sorry for the delay in responding - how does this iteration look?

njsmith

I'm still on the fence about capture. Do we really need it?

And if we do need it, I think capture_both is a much better name... I worry that people will write or read capture and accidentally swallow stderr again.

trio/_subprocess.py

smurfix · 2019-03-05T07:16:54Z

-1 on capture. +0 on capture_both; there's exactly one reason I can think of why you would do that, and that's to grab the output of "/usr/bin/whatever --help", because you never know whether it's going to be printer to stdout or stderr – but you can handle that case just as easily by capturing stdout and stderr separately.

njsmith · 2019-03-06T03:31:18Z

grab the output of "/usr/bin/whatever --help", because you never know whether it's going to be printer to stdout or stderr – but you can handle that case just as easily by capturing stdout and stderr separately.

Arguably the best way to do that is capture_stdout=True, stderr=STDOUT. (Actually this is what you want in a lot of cases – it loses the distinction between stdout and stderr, but in exchange you get them interleaved properly in time.)

oremanj · 2019-03-08T06:15:20Z

All right, I'll concede that my bias towards output capturing seems to be in the minority around here. :-) Have removed capture entirely and addressed the other comments.

dhirschfeld · 2019-03-09T02:09:06Z

in exchange you get them interleaved properly in time

For me this is critical and ~always what I do because it's the easiest way to debug when things go wrong. I've always found the stderr on it's own with no surrounding context to be pretty useless for debugging purposes.

Is there a way to do this with the current design? From a quick look I can't see how?

oremanj · 2019-03-09T02:20:03Z

Is there a way to do this with the current design? From a quick look I can't see how?

run_process() accepts all the Popen-style redirection options for stdin/stdout/stderr as alternatives to the higher-level stdin=b"stuff" / capture_stdout / capture_stderr. So you would say await run_process(..., capture_stdout=True, stderr=subprocess.STDOUT), and look at the .stdout attribute of the returned object.

smurfix · 2019-03-09T21:53:49Z

in exchange you get them interleaved properly in time

Well, I would – if both stdout and stderr were either line buffered or unbuffered. Unfortunately, that's rarely the case; typically stderr is either line buffered or unbuffered, no matter where it's going, while stdout's buffer is some-power-of-two when it's not going to a terminal.

njsmith · 2019-03-11T09:33:45Z

Well, I would – if both stdout and stderr were either line buffered or unbuffered. Unfortunately, that's rarely the case; typically stderr is either line buffered or unbuffered, no matter where it's going, while stdout's buffer is some-power-of-two when it's not going to a terminal.

Off-topic but sometimes useful hack: If your subprocess is python, then you can pass -u or set the envvar PYTHONUNBUFFERED=1 to force stdout and stderr into line-buffered mode.

The more general solution would be to allocate a tty instead of a pipe. That has some extra complications and I'm not sure if convenience functions for it are in-scope for trio or not, but if you want to open an issue to discuss then feel free.

Subprocess support: run_process()

de540de

Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.

oremanj requested a review from njsmith January 23, 2019 01:59

Fix newsfragment filename

6e96fa0

fix yapf

44deb1a

njsmith reviewed Jan 27, 2019

View reviewed changes

oremanj added 2 commits January 27, 2019 19:41

doc updates after CR; add passthrough= option to run_process()

f99dd32

Merge remote-tracking branch 'origin/master' into subprocess_run

fcb931a

python-trio deleted a comment from codecov bot Jan 28, 2019

auvipy approved these changes Feb 4, 2019

View reviewed changes

njsmith reviewed Feb 6, 2019

View reviewed changes

oremanj added 4 commits February 6, 2019 12:16

Merge remote-tracking branch 'origin/master' into subprocess_run

b857eae

Merge remote-tracking branch 'oremanj/subprocess_run' into subprocess…

cb1eda6

…_run

CR comments

339a6fa

yapf

4344b59

oremanj added 3 commits February 23, 2019 00:31

Implement the suggested approach to run_process() options

ef21983

Merge remote-tracking branch 'origin/master' into subprocess_run

8307a2e

Merge remote-tracking branch 'oremanj/subprocess_run' into subprocess…

b059508

…_run

This was referenced Mar 1, 2019

Should we merge the Stream and Channel interfaces? #959

Open

autodoc_member_order="bysource" still not working? python-trio/sphinxcontrib-trio#21

Open

njsmith reviewed Mar 4, 2019

View reviewed changes

trio/_subprocess.py Outdated Show resolved Hide resolved

trio/_subprocess.py Outdated Show resolved Hide resolved

oremanj added 2 commits March 7, 2019 22:08

Merge remote-tracking branch 'origin/master' into subprocess_run

3d12c4d

Respond to PR comments: remove capture=, accept bytearrays/etc

82d2127

auvipy approved these changes Mar 9, 2019

View reviewed changes

njsmith added 4 commits June 10, 2019 20:10

Describe options for capturing stdout+stderr

4eb4c61

Merge branch 'master' into subprocess_run

854bcdb

Un-_core-ify

dadd851

Add missing 'random' import

8a3f0fe

njsmith merged commit e93cc50 into python-trio:master Jun 11, 2019

This was referenced Jun 11, 2019

Subprocess support: high-level interface #833

Closed

Should we have a conventional way to "spawn a process into a nursery"? #1104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subprocess support: run_process() #872

Subprocess support: run_process() #872

oremanj commented Jan 23, 2019 •

edited

Loading

codecov bot commented Jan 23, 2019 •

edited

Loading

njsmith left a comment

njsmith Jan 27, 2019

oremanj Jan 28, 2019

njsmith Jan 28, 2019

njsmith Feb 6, 2019

njsmith Jan 27, 2019

oremanj Jan 28, 2019

njsmith Feb 6, 2019

njsmith Jan 27, 2019

oremanj Jan 28, 2019

oremanj Jan 28, 2019

njsmith Feb 6, 2019

oremanj Feb 6, 2019

Badg Mar 4, 2019 •

edited

Loading

njsmith Mar 4, 2019

Badg Mar 5, 2019

njsmith Mar 5, 2019 •

edited

Loading

Badg Mar 8, 2019

oremanj commented Feb 3, 2019

njsmith left a comment

njsmith Feb 6, 2019

njsmith Feb 6, 2019

oremanj commented Feb 6, 2019

oremanj commented Feb 14, 2019

oremanj commented Feb 23, 2019

njsmith left a comment

smurfix commented Mar 5, 2019

njsmith commented Mar 6, 2019

oremanj commented Mar 8, 2019

dhirschfeld commented Mar 9, 2019

oremanj commented Mar 9, 2019

smurfix commented Mar 9, 2019

njsmith commented Mar 11, 2019

Subprocess support: run_process() #872

Subprocess support: run_process() #872

Conversation

oremanj commented Jan 23, 2019 • edited Loading

codecov bot commented Jan 23, 2019 • edited Loading

Codecov Report

njsmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Badg Mar 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njsmith Mar 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oremanj commented Feb 3, 2019

njsmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oremanj commented Feb 6, 2019

oremanj commented Feb 14, 2019

oremanj commented Feb 23, 2019

njsmith left a comment

Choose a reason for hiding this comment

smurfix commented Mar 5, 2019

njsmith commented Mar 6, 2019

oremanj commented Mar 8, 2019

dhirschfeld commented Mar 9, 2019

oremanj commented Mar 9, 2019

smurfix commented Mar 9, 2019

njsmith commented Mar 11, 2019

oremanj commented Jan 23, 2019 •

edited

Loading

codecov bot commented Jan 23, 2019 •

edited

Loading

Badg Mar 4, 2019 •

edited

Loading

njsmith Mar 5, 2019 •

edited

Loading