Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add curio-inspired POC monitor #429

Closed
wants to merge 2 commits into from

Conversation

touilleMan
Copy link
Member

Following #413, this is basically a port of the Curio monitor system to trio. I've been pleased by the instrument system which made the task really simple ;-)

I don't believe we should currently merge this PR, it's just to keep a track and start playing with something ;-)

As @njsmith said, this is probably better to move this into it own pypi library once it will mature enough

@njsmith
Copy link
Member

njsmith commented Feb 7, 2018

🍰 🎉

Not a code review, but some random thoughts:

Check out notes-to-self/print-task-tree.py, you might want to steal it and integrate it

The hazmat docs have an example of getting a traceback from a non-running task, so you can make that Just Work. (Maybe we should provide a helper on the Task object that handles the annoying bit of figuring out whether the task is running and then using the right technique, so you can just do task.traceback() or something?)

Even if you want to keep the monitor in a separate thread, it might be convenient to use trio in that thread instead of having to implement your own cancellation support. (You can have multiple calls to run happening in different threads at the same time; they don't interfere with each other or anything.) Just a thought.

Though it might make just as much sense to drop the threads altogether and run the monitor as a regular task. You have to get into the main thread to do anything interesting anyway, and it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection, because it opens the door to using the rest of trio's stream machinery (e.g. running over SSL).

@touilleMan
Copy link
Member Author

touilleMan commented Feb 7, 2018

so you can just do task.traceback() or something?

Yup, that would be really convenient for debugging

Even if you want to keep the monitor in a separate thread, it might be convenient to use trio in that thread instead of having to implement your own cancellation support.

First it's not my own cancellation support, but Curio's one 😆
More seriously, I see a reason not to use trio here: when debugging trio itself it can be better to know the monitor is a totally separated entity (the more nasty the bug, the more appealing this guarantee is).
However as you said using trio in the monitor is also really cool (multiple clients and SSL support means we can start considering this monitor system to be more that just a small debug hack you run on localhost while running your unittests)

Though it might make just as much sense to drop the threads altogether and run the monitor as a regular task.

But this means the monitor won't respond if the trio loop is blocked, which is a typical usecase where we would like to use the monitor in the first place

You have to get into the main thread to do anything interesting anyway,

We can do all the read-only stuff from another thread, this seems a good part to me

it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection

\^__^/

@touilleMan
Copy link
Member Author

Another thing I'm not sure is the id(task) to design a task.
On one hand it's really the ID of the task object, but on the other it's a pretty cumbersome one (15 characters long, often close one to another, making them hard to read and hard to type...)
Maybe a better solution would be for the monitor to have it own task counter and design task by their order of creation.

@Fuyukai
Copy link
Member

Fuyukai commented Feb 7, 2018

You have to get into the main thread to do anything interesting anyway, and it would be nice if the monitor could run on an arbitrary Stream instead of using a hard coded socket connection, because it opens the door to using the rest of trio's stream machinery (e.g. running over SSL).

IMO whilst this sounds (and is) cool, you open yourself up to the Werkzeug debugger attack if you listen on a remotely accessible socket, so you'd either have to tell users not to do that, add built-in authentication of some form, or allow users to add their own at which point you'd be making an even more complex debugger framework which seems out-of-scope for a basic monitor.

@njsmith
Copy link
Member

njsmith commented Feb 8, 2018

Oh, I see, I missed that you were using an Instrument to keep a shadow table to track the list of tasks. The other approach is to pull it out of trio on the fly (the print-task-tree.py file shows how). The benefit of keeping the shadow table is that you can print a list of tasks, and potentially which one is running currently, even when the run loop is locked up and non-reponsive. The downside is that you need to know ahead of time that you're going to have a problem and set up the Instrument, and it adds some overhead during normal execution.

The other approach is to wait until someone actually connects to the monitor before doing anything. This is nice because it doesn't add any overhead, so you can leave your production app running in this mode and then connect after things have gone wrong. The downside is that if things have really gone wrong, you might not be able to connect...

open yourself up to the Werkzeug debugger attack if you listen on a remotely accessible socket

Oh for sure you don't want to allow untrusted connections to a debug interface like this. But "trusted" and "local" aren't necessarily the same. You could have a policy that even local users need to provide some authentication before connecting to the debug interface. Or you could be in a situation where your program is running in a container environment and you have to export a regular tcp socket to access it at all (presumably locking it down somehow in the process). Or maybe you really want to access it remotely and are taking appropriate precautions, like requiring an SSL client certificate or ssh authentication. (The twisted "manhole" interface supports a mode where you connect using ssh.) If the core interface just takes a Stream, then it's straightforward to build all these things on top. And of course we'd still want to provide some helpers with secure defaults.

Thinking about it, we might be able to make the task tree dumping work from other threads, without actually entering the trio thread. Most operations would be to be run on the thread though.

So maybe a best-of-all-worlds approach would be to make the monitor use trio for power/flexibility, and design it so that it can run either in the main thread or in a secondary thread. This shouldn't be terribly difficult; basically it means routing operations through a portal object, which could either send them over to the main thread or else run them directly. And for a few operations that we particularly want to work even if the main thread is stuck, we can do some special case handling.

This way it'd even be possible to detect when our attempt to enter the main thread had gotten stuck, and print a stack dump.

@touilleMan
Copy link
Member Author

The other approach is to wait until someone actually connects to the monitor before doing anything. This is nice because it doesn't add any overhead, so you can leave your production app running in this mode and then connect after things have gone wrong.

One reason I choose to use a shadow table for the tasks was to be able to keep track of their state. I didn't see a way to retrieve this information from the task object itself.
If this is not possible, the trouble with actually starting the monitor once a connection occurs is it really often happened after the trouble occurs so when we're potentially not longer able to get the state of the tasks...

@njsmith
Copy link
Member

njsmith commented Feb 9, 2018

Which state are you unable to get from the task object?

@codecov
Copy link

codecov bot commented Feb 14, 2018

Codecov Report

Merging #429 into master will decrease coverage by 1.52%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
- Coverage    99.2%   97.67%   -1.53%     
==========================================
  Files          89       90       +1     
  Lines       10332    10729     +397     
  Branches      719      773      +54     
==========================================
+ Hits        10250    10480     +230     
- Misses         63      225     +162     
- Partials       19       24       +5
Impacted Files Coverage Δ
trio/monitor.py 0% <0%> (ø)
trio/tests/test_highlevel_open_unix_stream.py 96% <0%> (-4%) ⬇️
trio/_util.py 92.98% <0%> (-1.81%) ⬇️
trio/tests/test_file_io.py 97.6% <0%> (-0.6%) ⬇️
trio/tests/test_path.py 100% <0%> (ø) ⬆️
trio/_core/tests/test_run.py 100% <0%> (ø) ⬆️
trio/_path.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3170205...49d11ec. Read the comment docs.

@touilleMan
Copy link
Member Author

I've corrected the stacktrace print system following @njsmith advices
I've also added display to trio.hazmat.current_statistics() because it was easy to do ^^

Finally I've copied the print-task-tree.py system which is super cool. I'm only sad the unicode pipes cannot be used (given the console goes trough telnet which only supports ascii). Beside it would be cool to print the task id along with it name in this view.

@Fuyukai
Copy link
Member

Fuyukai commented Feb 14, 2018

I wanted to test this out, but it seems to error?

ERROR:trio.abc.Instrument:Exception raised when calling 'task_exited' on instrument <trio.monitor.Monitor object at 0x7f28afc460f0>. Instrument has been disabled.
Traceback (most recent call last):
  File "/home/laura/.local/share/virtualenvs/curious-yiBjxk2Q/lib/python3.6/site-packages/trio/_core/_run.py", line 1063, in instrument
    method(*args)
  File "/home/laura/.local/share/virtualenvs/curious-yiBjxk2Q/lib/python3.6/site-packages/trio/monitor.py", line 102, in task_exited
    del self._tasks[id(task)]
KeyError: 139812724302288

EDIT: I guess passing this as instruments=[Monitor()] works, but it still shouldn't error.

It also doesn't work at all if you add it after calling trio.run.

@Fuyukai
Copy link
Member

Fuyukai commented Feb 14, 2018

Also, I got some errors when doing where commands:

trio > where 140120050418688
Bad command. 'ANextIter' object has no attribute 'gi_frame'

@touilleMan
Copy link
Member Author

touilleMan commented Feb 14, 2018

EDIT: I guess passing this as instruments=[Monitor()] works, but it still shouldn't error.
It also doesn't work at all if you add it after calling trio.run.

I've only tried it by passing to the trio.run() function, but it should make any change inserting it during the runtime from the monitor's point of view (only difference would be coroutines started before wouldn't be displayed). However I'm not sure how work the disconnection of an instrument so they maybe a trouble there...

Also, I got some errors when doing where commands:

I saw this error when trying to get the stacktrace of the <init> coroutine, is it what you tried to do?

@njsmith
Copy link
Member

njsmith commented Mar 13, 2018

@touilleMan @Fuyukai I'd like to close this and encourage you both to work on trio-monitor, but there's no code there yet :-). How do you want to move forward?

@njsmith
Copy link
Member

njsmith commented Jul 31, 2018

This is super cool work, but I don't think this specific PR is going anywhere, both because it should probably move to trio-monitor or similar, and because empirically nothing has happened in ~6 months :-). So I'm going to close this to get it off the list of open PRs. If you have time to pick it up again at some point though then that would be awesome.

@njsmith njsmith closed this Jul 31, 2018
@touilleMan
Copy link
Member Author

@njsmith We talk about this PR last weekend at the europython with @Fuyukai
We agreed @Fuyukai's codebase should be the one kept (given it is more "trionic"), but some modifications should be done on it (mostly the monitor should run in it own thread on a dedicated trio loop to prevent it from beeing blocked when using pdb or when a hard deadlock occurs)

However I don't know how far @Fuyukai with this during the sprint (I was working on triopg on my side)

@njsmith
Copy link
Member

njsmith commented Jul 31, 2018

@touilleMan awesome, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants