It'd be nice if we had a test helper for systematically injecting cancellations #77

njsmith · 2017-03-10T20:13:44Z

It's well known that error handling and cleanup code is notoriously prone to bugs, because it's hard to test and often untested. We have full visibility into the cancellation states of the code we run; it should be possible to provide a pretty sweet fault injection framework to test cancellation handling.

For example, a possible API would be: run this test function repeatedly, while injecting cancellations at different points, until all possible cancellations have been tried, and let any exceptions escape

Technically, this might be implemented as:

keep a record of places where we have issued a cancellation (keyed by stack snapshots or similar)
when we do an "are we cancelled?" check, first check against this database, and if we're at a never-seen-before location then immediately issue a cancellation

There's some subtlety to the choice of key:

we might want to distinguish between the different cancel scopes in the stack (e.g. the first time we hit cancel point X, cancel the topmost scope; the second time we hit cancel point X, cancel the next-to-topmost scope, etc., and only move on to the next cancel point after we've exercised all the scopes at the first cancel point)
for the very common case of I/O loops, we might want to distinguish cancellation on the first-iteration versus later-iterations?

It would also be neat if there were a way to teach coverage to report on which cancellation branches had been exercised.

See also: #239

The text was updated successfully, but these errors were encountered:

njsmith · 2017-04-29T03:51:14Z

I guess this is something that the coverage folks have discussed a bit in the past, e.g. thread from 2010: https://groups.google.com/forum/m/#!topic/coveragepy-dev/pKgslISuFjM

We should probably implement the ability to test cancellations before talking to the coverage folks about measuring it, since there's not much point in asking them to implement something that people can't actually use :-).

Probably the request would be (a) a configurable (opt-in) way to treat await statements as "branches" that have a "raised Cancelled" branch, and (b) a way to mark particular functions as not counting as "cancellable" (so sites that only call those functions wouldn't be considered a cancellation-branch-point even if they use await). The main reason this is needed is for async context managers, like trio.Lock: acquiring a lock can block, so locks need to be async context managers instead of regular context managers, and then you have to make releasing the lock (Lock.__aexit__) marked as async as well, even though it doesn't actually do anything async.

njsmith · 2017-05-11T03:02:52Z

The generic stream tests are probably a place that could make use of this.

njsmith · 2017-07-02T04:21:45Z

we might want to distinguish between the different cancel scopes in the stack (e.g. the first time we hit cancel point X, cancel the topmost scope; the second time we hit cancel point X, cancel the next-to-topmost scope, etc., and only move on to the next cancel point after we've exercised all the scopes at the first cancel point)

Not sure what I was thinking here exactly, but it doesn't make sense to pick random internal cancel scopes and cancel them, since cancel scopes don't spontaneously cancel themselves like that normally. The obvious case is testing external cancellation of some public API, like:

cancel_tester = CancelTester()
while cancel_tester.work_to_do():
    with open_cancel_scope() as scope:
        cancel_tester.set_scope(scope)
        await do_http_request()
    # make some assertions about how it handled it, check for ResourceWarnings, whatever

(This is pretty awkward way of doing the API, ergonomics could be improved, but you get the idea.)

The other case that might make sense is for testing internal timeouts, since timeouts can go off semi-randomly. For this we'd want to integrate with MockClock I suppose, like -- check when the next timeout could happen and make time jump forward to that.

njsmith · 2017-07-02T22:27:16Z

Whatever API we come up with needs to have some provision for doing per-run setup/teardown kind of stuff that happens outside of the cancel-scope-under-test. For example, you might want to fire up an HTTP server, start a client against it, and then cancel the server but not the client, and then make some assertions about what the client sees. And you need a new client each time.

njsmith · 2017-07-04T03:02:23Z

I think the core change to implement this would be to add an instrument hook for "task is at a cancellation point". Possibly the cleanest way to do this would be to call it from yield_if_cancelled and yield_indefinitely, so that the instrument gets called inside task context and can do things like inspect the stack.

The other way to do it would be to put the instrument hook in yield_if_cancelled and in the run loop where we receive the yield_indefinitely yield. But this leads to pretty different calling contexts (can potentially be handled by saying that you always have to fetch the task object and interrogate its coroutine object – though, does that work properly when a coroutine is running?), and worse, it means that if an instrument issues a cancellation in yield_if_cancelled then it will get called again a moment later when we actually yield.

Though... that's... actually what happens in both designs. Ugh. I guess we could give yield_if_cancelled its own unique trap, that yields the same object as yield_indefinitely but doesn't have the instrument hook.

This all might require a bit of code rearranging, since yield_indefinitely currently lives in a different file from the instrument logic.

Possibly this is also an argument for having some kind of first-class support for yield_indefinitely_no_cancellation, since run_in_worker_thread in particular uses yield_indefinitely without it being a cancellation point, and it could lead to a bit of silliness if we inject a cancel there. In particular: if a test wants to check that all cancel points actually cause Cancelled to propagate out, then this would mess that up.

njsmith · 2017-07-17T01:19:58Z

(Side note: if we add an instrumentation hook for cancel points, then we might also want to think about moving assert_yields / assert_no_yields to using it.)

njsmith added design discussion user happiness labels Mar 10, 2017

njsmith added the pytest-trio relevant label Jul 2, 2017

njsmith mentioned this issue Jul 4, 2017

Tools for finding scheduler-dependent "heisenbugs" #239

Open

oremanj added debugging cancellation labels May 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It'd be nice if we had a test helper for systematically injecting cancellations #77

It'd be nice if we had a test helper for systematically injecting cancellations #77

njsmith commented Mar 10, 2017 •

edited

Loading

njsmith commented Apr 29, 2017

njsmith commented May 11, 2017

njsmith commented Jul 2, 2017

njsmith commented Jul 2, 2017

njsmith commented Jul 4, 2017

njsmith commented Jul 17, 2017

It'd be nice if we had a test helper for systematically injecting cancellations #77

It'd be nice if we had a test helper for systematically injecting cancellations #77

Comments

njsmith commented Mar 10, 2017 • edited Loading

njsmith commented Apr 29, 2017

njsmith commented May 11, 2017

njsmith commented Jul 2, 2017

njsmith commented Jul 2, 2017

njsmith commented Jul 4, 2017

njsmith commented Jul 17, 2017

njsmith commented Mar 10, 2017 •

edited

Loading