Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It'd be nice if we had a test helper for systematically injecting cancellations #77

Open
njsmith opened this issue Mar 10, 2017 · 6 comments

Comments

@njsmith
Copy link
Member

njsmith commented Mar 10, 2017

It's well known that error handling and cleanup code is notoriously prone to bugs, because it's hard to test and often untested. We have full visibility into the cancellation states of the code we run; it should be possible to provide a pretty sweet fault injection framework to test cancellation handling.

For example, a possible API would be: run this test function repeatedly, while injecting cancellations at different points, until all possible cancellations have been tried, and let any exceptions escape

Technically, this might be implemented as:

  • keep a record of places where we have issued a cancellation (keyed by stack snapshots or similar)
  • when we do an "are we cancelled?" check, first check against this database, and if we're at a never-seen-before location then immediately issue a cancellation

There's some subtlety to the choice of key:

  • we might want to distinguish between the different cancel scopes in the stack (e.g. the first time we hit cancel point X, cancel the topmost scope; the second time we hit cancel point X, cancel the next-to-topmost scope, etc., and only move on to the next cancel point after we've exercised all the scopes at the first cancel point)
  • for the very common case of I/O loops, we might want to distinguish cancellation on the first-iteration versus later-iterations?

It would also be neat if there were a way to teach coverage to report on which cancellation branches had been exercised.

See also: #239

@njsmith
Copy link
Member Author

njsmith commented Apr 29, 2017

I guess this is something that the coverage folks have discussed a bit in the past, e.g. thread from 2010: https://groups.google.com/forum/m/#!topic/coveragepy-dev/pKgslISuFjM

We should probably implement the ability to test cancellations before talking to the coverage folks about measuring it, since there's not much point in asking them to implement something that people can't actually use :-).

Probably the request would be (a) a configurable (opt-in) way to treat await statements as "branches" that have a "raised Cancelled" branch, and (b) a way to mark particular functions as not counting as "cancellable" (so sites that only call those functions wouldn't be considered a cancellation-branch-point even if they use await). The main reason this is needed is for async context managers, like trio.Lock: acquiring a lock can block, so locks need to be async context managers instead of regular context managers, and then you have to make releasing the lock (Lock.__aexit__) marked as async as well, even though it doesn't actually do anything async.

@njsmith
Copy link
Member Author

njsmith commented May 11, 2017

The generic stream tests are probably a place that could make use of this.

@njsmith
Copy link
Member Author

njsmith commented Jul 2, 2017

we might want to distinguish between the different cancel scopes in the stack (e.g. the first time we hit cancel point X, cancel the topmost scope; the second time we hit cancel point X, cancel the next-to-topmost scope, etc., and only move on to the next cancel point after we've exercised all the scopes at the first cancel point)

Not sure what I was thinking here exactly, but it doesn't make sense to pick random internal cancel scopes and cancel them, since cancel scopes don't spontaneously cancel themselves like that normally. The obvious case is testing external cancellation of some public API, like:

cancel_tester = CancelTester()
while cancel_tester.work_to_do():
    with open_cancel_scope() as scope:
        cancel_tester.set_scope(scope)
        await do_http_request()
    # make some assertions about how it handled it, check for ResourceWarnings, whatever

(This is pretty awkward way of doing the API, ergonomics could be improved, but you get the idea.)

The other case that might make sense is for testing internal timeouts, since timeouts can go off semi-randomly. For this we'd want to integrate with MockClock I suppose, like -- check when the next timeout could happen and make time jump forward to that.

@njsmith
Copy link
Member Author

njsmith commented Jul 2, 2017

Whatever API we come up with needs to have some provision for doing per-run setup/teardown kind of stuff that happens outside of the cancel-scope-under-test. For example, you might want to fire up an HTTP server, start a client against it, and then cancel the server but not the client, and then make some assertions about what the client sees. And you need a new client each time.

@njsmith
Copy link
Member Author

njsmith commented Jul 4, 2017

I think the core change to implement this would be to add an instrument hook for "task is at a cancellation point". Possibly the cleanest way to do this would be to call it from yield_if_cancelled and yield_indefinitely, so that the instrument gets called inside task context and can do things like inspect the stack.

The other way to do it would be to put the instrument hook in yield_if_cancelled and in the run loop where we receive the yield_indefinitely yield. But this leads to pretty different calling contexts (can potentially be handled by saying that you always have to fetch the task object and interrogate its coroutine object – though, does that work properly when a coroutine is running?), and worse, it means that if an instrument issues a cancellation in yield_if_cancelled then it will get called again a moment later when we actually yield.

Though... that's... actually what happens in both designs. Ugh. I guess we could give yield_if_cancelled its own unique trap, that yields the same object as yield_indefinitely but doesn't have the instrument hook.

This all might require a bit of code rearranging, since yield_indefinitely currently lives in a different file from the instrument logic.

Possibly this is also an argument for having some kind of first-class support for yield_indefinitely_no_cancellation, since run_in_worker_thread in particular uses yield_indefinitely without it being a cancellation point, and it could lead to a bit of silliness if we inject a cancel there. In particular: if a test wants to check that all cancel points actually cause Cancelled to propagate out, then this would mess that up.

@njsmith
Copy link
Member Author

njsmith commented Jul 17, 2017

(Side note: if we add an instrumentation hook for cancel points, then we might also want to think about moving assert_yields / assert_no_yields to using it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants