Disable cancel scope reuse #911

oremanj · 2019-02-07T00:17:39Z

To keep our options open for adding a smarter was_cancelled property, per discussion in #886.

To keep our options open for adding a smarter `was_cancelled` property, per discussion in python-trio#886.

codecov · 2019-02-07T00:32:19Z

Codecov Report

Merging #911 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #911      +/-   ##
=========================================
- Coverage    99.4%   99.4%   -0.01%     
=========================================
  Files         102     102              
  Lines       12681   12678       -3     
  Branches      927     927              
=========================================
- Hits        12605   12602       -3     
  Misses         56      56              
  Partials       20      20

Impacted Files	Coverage Δ
trio/_core/tests/test_run.py	`100% <100%> (ø)`	⬆️
trio/_core/_run.py	`99.71% <100%> (-0.01%)`	⬇️

njsmith · 2019-02-07T03:26:07Z

trio/_core/_run.py

-                    " in another task ({!r})".format(self._scope_task.name)
-                )
+                "cancel scopes may not be reused or reentered "
+                "(first used in task {!r})".format(self._scope_task.name)
            )
        self._scope_task = task


Do we use self._scope_task for anything else now? If it's effectively a bool, it would probably be better to make an an actual bool, so we don't inadvertently hold onto any stack frames longer than we need to.

We use it in _exc_filter (testing whether self._scope_task._pending_cancel_scope() is self). I also don't think holding onto a Task can actually leak much memory -- if the task is running, it's referenced by the runner so our reference doesn't change the memory usage, and if it's done, it doesn't have any stack frames so the memory usage is small. Is there something I'm missing?

Maybe CancelScope._exc_filter should use current_task()? ...and maybe it should be refactored into CancelScope._close. And actually, now that all Cancelled exceptions are the same, we can hoist the check up out of _exc_filter entirely; no reason _close should call MultiError.filter at all if it's not going to catch anything. That's another matter though...

I guess you're right that coroutine objects drop their cr_frame reference when the coroutine exits. Huh, I didn't know that. That does reduce the likelihood of this reference causing problems. So now it still makes me nervous to hold onto an unnecessary reference to a complex object, but it's only on general principles, not because of any specific issue I can point to :-).

njsmith · 2019-02-07T06:30:52Z

Huh, I would have added a self._has_been_entered = False field. It wouldn't have occurred to me overload self._tasks to track this as well. Is there a reason for it I'm missing? Sorry to nitpick, I just find it easier to read code where the state is as decoupled as possible...

oremanj · 2019-02-07T06:32:30Z

I figured it would be one fewer slot. I'll add the separate bool.

oremanj · 2019-02-07T06:45:35Z

Azure Ubuntu timeout again, same reason as last night.

oremanj · 2019-02-07T06:50:21Z

Even slower on a rerun.

... are we getting any value out of the lone Linux test environment on Azure? Should we just disable it?

njsmith · 2019-02-07T07:03:34Z

are we getting any value out of the lone Linux test environment on Azure?

Unfortunately yes, it's our only environment that has openssl 1.1.1, and so far our experience has been that 1.1.1 is cranky and easy to break.

We could move it to Travis, or bump up the timeout as a temporary workaround...

It's super weird that the two times this has happened were both at 10:00-11:00 PM west coast time. I wonder if that's accidental.

pquentin · 2019-02-07T07:04:15Z

It's super weird that the two times this has happened were both at 10:00-11:00 PM west coast time. I wonder if that's accidental.

I don't think so: I think Azure Ubuntu is getting popular, and this is going to happen every day.

njsmith · 2019-02-07T07:08:17Z

@pquentin my experience with other CI systems though is that they get overloaded in the middle of the USA working day. OTOH we're hosted in the Azure "Central US" region in Iowa, so the slowdown seems to starting right at midnight local time. Suspicious...

pquentin · 2019-02-07T07:11:03Z

Ah, that explanation is better! 👍 Maybe they have cron jobs running at midnight.

njsmith · 2019-02-07T07:39:43Z

Hmm, actually, it's not quite at midnight.

The first time we had this problem was 2019-02-05 (UTC):

The last quick build was 20190205.4 (ran in ~2 min), which started at 2019-02-05 5:57:15Z.
The first slow build was 20190205.5, which started at 2019-02-05T06:39:20Z.
Then we had a series of slow builds.
Build 20190205.10 was borderline – it didn't quite hit the timeout, but squeaked in at ~9:45 min. It started at 2019-02-05T07:07:27Z.
The next build, 20190205.11 was definitely fast again – ~2 min runtime, started at 2019-02-05T07:35:19Z.

So this episode started somewhere between 05:57 and 06:39 UTC, and ended somewhere between 07:07 and 07:35 UTC.

The next day, 2019-02-05 UTC, we didn't run any builds between 2019-02-05T23:51:12Z and 2019-02-06T08:09:57Z, so nowhere near the period where the slowdown occurred.

Then then the day after that puts us at today, 2019-02-06 UTC:

Build 20190207.11 was quick (~2 min), and started at 2019-02-07T06:15:08Z. In Iowa, that's 00:15, so today it was still fast a few minutes after local midnight.
The next build, 20190207.12, was slow, and started at 2019-02-07T06:35:21Z
Then we had a series of slow builds, the last one being 20190207.14, which started at 2019-02-07T07:04:15Z
And then 20190207.15 was only slightly slow (~4 min), and started at 2019-02-07T07:15:17Z

So this time the slowdown started between 06:15 and 06:35 UTC, and ended around 07:15 UTC. That is still remarkably consistent with the previous window.

I also peeked back one day into the past, and it turns out that we did do a build at 2019-02-04T06:42:28Z, which went fine. So either the temporal consistenty is a total coincidence, or else this is a new problem that just started on 2019-02-05.

[Edit: it just happened again, this time at 2019-02-14 06:36 UTC.]

Disable cancel scope reuse

90b7fcc

To keep our options open for adding a smarter `was_cancelled` property, per discussion in python-trio#886.

oremanj requested a review from njsmith February 7, 2019 00:17

yapf

7ac2bea

njsmith reviewed Feb 7, 2019

View reviewed changes

Remove CancelScope._scope_task

ad1f566

Don't overload _tasks to also indicate the has-been-entered-yet flag

eef73af

oremanj closed this Feb 7, 2019

oremanj reopened this Feb 7, 2019

njsmith closed this Feb 7, 2019

njsmith reopened this Feb 7, 2019

njsmith closed this Feb 7, 2019

njsmith reopened this Feb 7, 2019

njsmith merged commit 9f5afc8 into python-trio:master Feb 7, 2019

njsmith mentioned this pull request Feb 14, 2019

Bump cffi from 1.11.5 to 1.12.0 #929

Merged

njsmith mentioned this pull request Aug 19, 2019

Ensure MockClock autojump never feed jump with negative values #1194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable cancel scope reuse #911

Disable cancel scope reuse #911

oremanj commented Feb 7, 2019

codecov bot commented Feb 7, 2019 •

edited

Loading

njsmith Feb 7, 2019

oremanj Feb 7, 2019

njsmith Feb 7, 2019

njsmith commented Feb 7, 2019

oremanj commented Feb 7, 2019

oremanj commented Feb 7, 2019

oremanj commented Feb 7, 2019

njsmith commented Feb 7, 2019

pquentin commented Feb 7, 2019 •

edited

Loading

njsmith commented Feb 7, 2019

pquentin commented Feb 7, 2019

njsmith commented Feb 7, 2019 •

edited

Loading

Disable cancel scope reuse #911

Disable cancel scope reuse #911

Conversation

oremanj commented Feb 7, 2019

codecov bot commented Feb 7, 2019 • edited Loading

Codecov Report

njsmith Feb 7, 2019

Choose a reason for hiding this comment

oremanj Feb 7, 2019

Choose a reason for hiding this comment

njsmith Feb 7, 2019

Choose a reason for hiding this comment

njsmith commented Feb 7, 2019

oremanj commented Feb 7, 2019

oremanj commented Feb 7, 2019

oremanj commented Feb 7, 2019

njsmith commented Feb 7, 2019

pquentin commented Feb 7, 2019 • edited Loading

njsmith commented Feb 7, 2019

pquentin commented Feb 7, 2019

njsmith commented Feb 7, 2019 • edited Loading

codecov bot commented Feb 7, 2019 •

edited

Loading

pquentin commented Feb 7, 2019 •

edited

Loading

njsmith commented Feb 7, 2019 •

edited

Loading