Add curio-inspired POC monitor #429

touilleMan · 2018-02-07T17:11:54Z

Following #413, this is basically a port of the Curio monitor system to trio. I've been pleased by the instrument system which made the task really simple ;-)

I don't believe we should currently merge this PR, it's just to keep a track and start playing with something ;-)

As @njsmith said, this is probably better to move this into it own pypi library once it will mature enough

njsmith · 2018-02-07T17:33:35Z

🍰 🎉

Not a code review, but some random thoughts:

Check out notes-to-self/print-task-tree.py, you might want to steal it and integrate it

The hazmat docs have an example of getting a traceback from a non-running task, so you can make that Just Work. (Maybe we should provide a helper on the Task object that handles the annoying bit of figuring out whether the task is running and then using the right technique, so you can just do task.traceback() or something?)

Even if you want to keep the monitor in a separate thread, it might be convenient to use trio in that thread instead of having to implement your own cancellation support. (You can have multiple calls to run happening in different threads at the same time; they don't interfere with each other or anything.) Just a thought.

Though it might make just as much sense to drop the threads altogether and run the monitor as a regular task. You have to get into the main thread to do anything interesting anyway, and it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection, because it opens the door to using the rest of trio's stream machinery (e.g. running over SSL).

touilleMan · 2018-02-07T17:56:25Z

so you can just do task.traceback() or something?

Yup, that would be really convenient for debugging

Even if you want to keep the monitor in a separate thread, it might be convenient to use trio in that thread instead of having to implement your own cancellation support.

First it's not my own cancellation support, but Curio's one 😆
More seriously, I see a reason not to use trio here: when debugging trio itself it can be better to know the monitor is a totally separated entity (the more nasty the bug, the more appealing this guarantee is).
However as you said using trio in the monitor is also really cool (multiple clients and SSL support means we can start considering this monitor system to be more that just a small debug hack you run on localhost while running your unittests)

Though it might make just as much sense to drop the threads altogether and run the monitor as a regular task.

But this means the monitor won't respond if the trio loop is blocked, which is a typical usecase where we would like to use the monitor in the first place

You have to get into the main thread to do anything interesting anyway,

We can do all the read-only stuff from another thread, this seems a good part to me

it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection

\^__^/

touilleMan · 2018-02-07T19:01:58Z

Another thing I'm not sure is the id(task) to design a task.
On one hand it's really the ID of the task object, but on the other it's a pretty cumbersome one (15 characters long, often close one to another, making them hard to read and hard to type...)
Maybe a better solution would be for the monitor to have it own task counter and design task by their order of creation.

Fuyukai · 2018-02-07T20:55:26Z

You have to get into the main thread to do anything interesting anyway, and it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection, because it opens the door to using the rest of trio's stream machinery (e.g. running over SSL).

IMO whilst this sounds (and is) cool, you open yourself up to the Werkzeug debugger attack if you listen on a remotely accessible socket, so you'd either have to tell users not to do that, add built-in authentication of some form, or allow users to add their own at which point you'd be making an even more complex debugger framework which seems out-of-scope for a basic monitor.

njsmith · 2018-02-08T02:05:02Z

Oh, I see, I missed that you were using an Instrument to keep a shadow table to track the list of tasks. The other approach is to pull it out of trio on the fly (the print-task-tree.py file shows how). The benefit of keeping the shadow table is that you can print a list of tasks, and potentially which one is running currently, even when the run loop is locked up and non-reponsive. The downside is that you need to know ahead of time that you're going to have a problem and set up the Instrument, and it adds some overhead during normal execution.

The other approach is to wait until someone actually connects to the monitor before doing anything. This is nice because it doesn't add any overhead, so you can leave your production app running in this mode and then connect after things have gone wrong. The downside is that if things have really gone wrong, you might not be able to connect...

open yourself up to the Werkzeug debugger attack if you listen on a remotely accessible socket

Oh for sure you don't want to allow untrusted connections to a debug interface like this. But "trusted" and "local" aren't necessarily the same. You could have a policy that even local users need to provide some authentication before connecting to the debug interface. Or you could be in a situation where your program is running in a container environment and you have to export a regular tcp socket to access it at all (presumably locking it down somehow in the process). Or maybe you really want to access it remotely and are taking appropriate precautions, like requiring an SSL client certificate or ssh authentication. (The twisted "manhole" interface supports a mode where you connect using ssh.) If the core interface just takes a Stream, then it's straightforward to build all these things on top. And of course we'd still want to provide some helpers with secure defaults.

Thinking about it, we might be able to make the task tree dumping work from other threads, without actually entering the trio thread. Most operations would be to be run on the thread though.

So maybe a best-of-all-worlds approach would be to make the monitor use trio for power/flexibility, and design it so that it can run either in the main thread or in a secondary thread. This shouldn't be terribly difficult; basically it means routing operations through a portal object, which could either send them over to the main thread or else run them directly. And for a few operations that we particularly want to work even if the main thread is stuck, we can do some special case handling.

This way it'd even be possible to detect when our attempt to enter the main thread had gotten stuck, and print a stack dump.

touilleMan · 2018-02-08T15:21:31Z

The other approach is to wait until someone actually connects to the monitor before doing anything. This is nice because it doesn't add any overhead, so you can leave your production app running in this mode and then connect after things have gone wrong.

One reason I choose to use a shadow table for the tasks was to be able to keep track of their state. I didn't see a way to retrieve this information from the task object itself.
If this is not possible, the trouble with actually starting the monitor once a connection occurs is it really often happened after the trouble occurs so when we're potentially not longer able to get the state of the tasks...

njsmith · 2018-02-09T08:23:01Z

Which state are you unable to get from the task object?

…mands

codecov · 2018-02-14T19:42:05Z

Codecov Report

Merging #429 into master will decrease coverage by 1.52%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
- Coverage    99.2%   97.67%   -1.53%     
==========================================
  Files          89       90       +1     
  Lines       10332    10729     +397     
  Branches      719      773      +54     
==========================================
+ Hits        10250    10480     +230     
- Misses         63      225     +162     
- Partials       19       24       +5

Impacted Files	Coverage Δ
trio/monitor.py	`0% <0%> (ø)`
trio/tests/test_highlevel_open_unix_stream.py	`96% <0%> (-4%)`	⬇️
trio/_util.py	`92.98% <0%> (-1.81%)`	⬇️
trio/tests/test_file_io.py	`97.6% <0%> (-0.6%)`	⬇️
trio/tests/test_path.py	`100% <0%> (ø)`	⬆️
trio/_core/tests/test_run.py	`100% <0%> (ø)`	⬆️
trio/_path.py	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3170205...49d11ec. Read the comment docs.

touilleMan · 2018-02-14T19:45:21Z

I've corrected the stacktrace print system following @njsmith advices
I've also added display to trio.hazmat.current_statistics() because it was easy to do ^^

Finally I've copied the print-task-tree.py system which is super cool. I'm only sad the unicode pipes cannot be used (given the console goes trough telnet which only supports ascii). Beside it would be cool to print the task id along with it name in this view.

Fuyukai · 2018-02-14T22:11:22Z

I wanted to test this out, but it seems to error?

ERROR:trio.abc.Instrument:Exception raised when calling 'task_exited' on instrument <trio.monitor.Monitor object at 0x7f28afc460f0>. Instrument has been disabled.
Traceback (most recent call last):
  File "/home/laura/.local/share/virtualenvs/curious-yiBjxk2Q/lib/python3.6/site-packages/trio/_core/_run.py", line 1063, in instrument
    method(*args)
  File "/home/laura/.local/share/virtualenvs/curious-yiBjxk2Q/lib/python3.6/site-packages/trio/monitor.py", line 102, in task_exited
    del self._tasks[id(task)]
KeyError: 139812724302288

EDIT: I guess passing this as instruments=[Monitor()] works, but it still shouldn't error.

It also doesn't work at all if you add it after calling trio.run.

Fuyukai · 2018-02-14T22:19:12Z

Also, I got some errors when doing where commands:

trio > where 140120050418688
Bad command. 'ANextIter' object has no attribute 'gi_frame'

touilleMan · 2018-02-14T23:09:30Z

EDIT: I guess passing this as instruments=[Monitor()] works, but it still shouldn't error.
It also doesn't work at all if you add it after calling trio.run.

I've only tried it by passing to the trio.run() function, but it should make any change inserting it during the runtime from the monitor's point of view (only difference would be coroutines started before wouldn't be displayed). However I'm not sure how work the disconnection of an instrument so they maybe a trouble there...

Also, I got some errors when doing where commands:

I saw this error when trying to get the stacktrace of the <init> coroutine, is it what you tried to do?

njsmith · 2018-03-13T23:56:37Z

@touilleMan @Fuyukai I'd like to close this and encourage you both to work on trio-monitor, but there's no code there yet :-). How do you want to move forward?

njsmith · 2018-07-31T05:54:57Z

This is super cool work, but I don't think this specific PR is going anywhere, both because it should probably move to trio-monitor or similar, and because empirically nothing has happened in ~6 months :-). So I'm going to close this to get it off the list of open PRs. If you have time to pick it up again at some point though then that would be awesome.

touilleMan · 2018-07-31T09:18:23Z

@njsmith We talk about this PR last weekend at the europython with @Fuyukai
We agreed @Fuyukai's codebase should be the one kept (given it is more "trionic"), but some modifications should be done on it (mostly the monitor should run in it own thread on a dedicated trio loop to prevent it from beeing blocked when using pdb or when a hard deadlock occurs)

However I don't know how far @Fuyukai with this during the sprint (I was working on triopg on my side)

njsmith · 2018-07-31T16:51:56Z

@touilleMan awesome, thanks for the update!

Add curio-inspired POC monitor

7a344f4

Improve monitor stacktrace print, add task tree and stats display com…

49d11ec

…mands

Fuyukai mentioned this pull request Feb 21, 2018

Add different implementation of a task monitor. [WIP] #451

Closed

3 tasks

njsmith closed this Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add curio-inspired POC monitor #429

Add curio-inspired POC monitor #429

touilleMan commented Feb 7, 2018

njsmith commented Feb 7, 2018

touilleMan commented Feb 7, 2018 •

edited

Loading

touilleMan commented Feb 7, 2018

Fuyukai commented Feb 7, 2018

njsmith commented Feb 8, 2018

touilleMan commented Feb 8, 2018

njsmith commented Feb 9, 2018

codecov bot commented Feb 14, 2018

touilleMan commented Feb 14, 2018

Fuyukai commented Feb 14, 2018 •

edited

Loading

Fuyukai commented Feb 14, 2018

touilleMan commented Feb 14, 2018 •

edited

Loading

njsmith commented Mar 13, 2018

njsmith commented Jul 31, 2018

touilleMan commented Jul 31, 2018

njsmith commented Jul 31, 2018

Add curio-inspired POC monitor #429

Add curio-inspired POC monitor #429

Conversation

touilleMan commented Feb 7, 2018

njsmith commented Feb 7, 2018

touilleMan commented Feb 7, 2018 • edited Loading

touilleMan commented Feb 7, 2018

Fuyukai commented Feb 7, 2018

njsmith commented Feb 8, 2018

touilleMan commented Feb 8, 2018

njsmith commented Feb 9, 2018

codecov bot commented Feb 14, 2018

Codecov Report

touilleMan commented Feb 14, 2018

Fuyukai commented Feb 14, 2018 • edited Loading

Fuyukai commented Feb 14, 2018

touilleMan commented Feb 14, 2018 • edited Loading

njsmith commented Mar 13, 2018

njsmith commented Jul 31, 2018

touilleMan commented Jul 31, 2018

njsmith commented Jul 31, 2018

touilleMan commented Feb 7, 2018 •

edited

Loading

Fuyukai commented Feb 14, 2018 •

edited

Loading

touilleMan commented Feb 14, 2018 •

edited

Loading