Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that Nursery.start_soon starts tasks in random order #970

Closed
rotu opened this issue Mar 7, 2019 · 14 comments · Fixed by #2596
Closed

Document that Nursery.start_soon starts tasks in random order #970

rotu opened this issue Mar 7, 2019 · 14 comments · Fixed by #2596
Labels

Comments

@rotu
Copy link
Contributor

rotu commented Mar 7, 2019

When calling Nursery.start_soon, submitted tasks are called in an apparently non-deterministic order. This is not intuitive (at least to me); since nursery.start_soon is synchronous I expected the functions to be started in FIFO order. If the order is non-deterministic on purpose, this should be documented.

Toy example:

>>> import trio
>>>
>>> print_lock = trio.StrictFIFOLock()
>>>
>>>
>>> async def print_soon(x):
...     async with print_lock:
...         print(x)
...
>>>
>>> async def amain():
...     for i in range(3):
...         print('-----')
...         async with trio.open_nursery() as nursery:
...             for j in range(3):
...                 nursery.start_soon(print_soon, j)
...
>>>
>>> trio.run(amain)
-----
0
2
1
-----
1
0
2
-----
2
1
0
@njsmith
Copy link
Member

njsmith commented Mar 7, 2019

Yes, this is intentional. They are started in FIFO order, in the sense that first task 1 is created and marked as runnable, then task 2, then task 3. And then the next time the parent task goes to sleep, the scheduler picks which task to run next. At this point those 3 tasks are all runnable – and in a more complex program, there might be arbitrary other tasks that were runnable as well – and the scheduler picks between them. And in general, Trio's scheduler doesn't make any guarantees about when it will run different tasks; all it guarantees is that all tasks will get a fair chance to run.

In practice, this is about the strongest guarantee you can usefully get from a scheduler. If you run tasks concurrently, then they run at different rates :-). In some cases you might be able to manually line up their execution in some consistent way for a while, but it's going to be quite fragile. You might also find #32 and #239 interesting.

If you want to make sure that task 1 has had a chance to finish starting up before starting task 2, then see nursery.start. (There's also an example here.)

Where would you have expected to see this in the docs?

@rotu
Copy link
Contributor Author

rotu commented Mar 7, 2019

That makes sense. My naive expectation was that, because the docs emphasized that control gets nondeterministic at checkpoints, tasks would run in order up until the first checkpoint, at which point things would get wibbly-wobbly.

I would expect to see this documented at Nursery.start_soon. In particular, the difference between "creating a new task" and "initializing a task" should be explained somewhere before the sentence "If you want to wait for the task to initialize itself before continuing, see start()."

@njsmith
Copy link
Member

njsmith commented Mar 7, 2019

That makes sense. My naive expectation was that, because the docs emphasized that control gets nondeterministic at checkpoints, tasks would run in order up until the first checkpoint, at which point things would get wibbly-wobbly.

I guess technically that's true – the child tasks don't get a chance to run at all until the parent task (the one that called start_soon) executes a checkpoint. And when it does, then things get wibbly-wobbly :-). But I see how this is confusing...

I would expect to see this documented at Nursery.start_soon. In particular, the difference between "creating a new task" and "initializing a task" should be explained somewhere before the sentence "If you want to wait for the task to initialize itself before continuing, see start()."

That makes sense. Would you be interested in submitting a patch to update the text? It's in docs/source/reference-core.rst.

@rotu
Copy link
Contributor Author

rotu commented Mar 7, 2019

Thanks. I think I can give it a whack.
Funnily enough, the following code gives deterministic output, which, according to the randomized scheduling, is a bug, not a feature:

>>> import trio
... 
... async def aprint2(i,*,task_status):
...     task_status.started()
...     await trio.sleep(0)
...     print(i)
... 
... async def amain2():
...     async with trio.open_nursery() as n:
...         for i in range(3):
...             await n.start(aprint2,i)
...             
>>> trio.run(amain2)
0
1
2

@njsmith
Copy link
Member

njsmith commented Mar 7, 2019

Thanks. I think I can give it a whack.

Awesome!

Funnily enough, the following code gives deterministic output, which, according to the randomized scheduling, is a bug, not a feature:

Yeah, currently Trio does make sure that all runnable tasks get the same number of checkpoints, and your observation is a side-effect of that. This is a somewhat crude attempt to control fairness, and make sure that all tasks make some progress. It might change in the future though... it's workable, but it tends to penalize tasks that checkpoint a lot versus tasks that only checkpoint occasionally. If we can come up with something better then we reserve the right to switch to it :-)

@rotu
Copy link
Contributor Author

rotu commented Mar 7, 2019

One more question. Is it dependable to flush unstarted nursery tasks with trio.sleep(0) or might tasks requested with start_soon procrastinate starting until the nursery context is closing?

@njsmith
Copy link
Member

njsmith commented Mar 7, 2019

@rotu Depends on what you mean by 'flush unstarted tasks'. Currently if you call trio.sleep(0) then all recently-started tasks will execute to their first checkpoint either before sleep(0) returns, or after the next checkpoint, depending on where the child tasks get scheduled relative to the parent task in the scheduler's next batch. But even this isn't something you should depend on.

The word "soon" in start_soon is meant to imply that the task will start at some point in the near future, but the exact point is intentionally left vague :-)

But, the nursery closing definitely doesn't affect anything, except that it's a where the parent task goes to sleep (= checkpoints) and the scheduler is free to run other stuff.

@smurfix
Copy link
Contributor

smurfix commented Mar 7, 2019

Hi,

following code gives deterministic output, which, according to the randomized scheduling, is a bug, not a feature:

No it isn't. A checkpoint is presumed to let at least one other runnable task proceed before it continues, and there are too many checkpoints in nursery.start – your aprint2 finishes before the next task "really" starts.

This code demonstrates randomness:

import trio
async def aprint2(i,*,task_status):
    task_status.started()
    for j in range(3):
        await trio.sleep(0)
        print(i,j)

async def amain2():
    async with trio.open_nursery() as n:
        for i in range(10):
            await n.start(aprint2,i)
trio.run(amain2)

@rotu
Copy link
Contributor Author

rotu commented Mar 7, 2019

No it isn't. A checkpoint is presumed to let other at least one other runnable task proceed before it continues, and there are too many checkpoints in nursery.start – your aprint2 finishes before the next task "really" starts.

That makes sense. Why does moving the print statement out of the loop (just outdent print(i,j) one level) seem to make it deterministic again? It doesn't change the number of checkpoints.

@smurfix
Copy link
Contributor

smurfix commented Mar 7, 2019

Because Trio batches the runnable tasks, so that none gets starved indefinitely. The overall effect is that while you get some reordering, no task can skip ahead / lag behind too far, and they still end up terminating in the same order they started in.

@oremanj oremanj changed the title Nursery.start_soon starts tasks in random order Document that Nursery.start_soon starts tasks in random order May 2, 2019
@oremanj oremanj added the docs label May 4, 2019
@bergus
Copy link

bergus commented Mar 1, 2020

I came to trio with the same expectation as the OP, and was confused in exactly the same way.

If multiple tasks begin with entering a fifo lock, entering a capacity limiter context, or sending to a channel, but then continue independently, it's quite often useful to have a number of them begin in a defined order. Immediately calling task_status.started(), then running until the first checkpoint is encountered, seems like a very useful strategy, common enough that it would warrant a helper function so that I don't have to repeat the task_status=trio.TASK_STATUS_IGNORED+task_status.started() pattern in each of the task functions. What do you think of introducing a new start_now(…) method that does exactly this?

@smurfix
Copy link
Contributor

smurfix commented Mar 2, 2020

Isn't it quite trivial to write that helper function when you need it?

async def start_now(nursery, fn,*args):
    async def _starter(fn, args,  task_status=trio.TASK_STATUS_IGNORED):
        task_status.started()
        return await fn(*args)
    await nursery.start(_starter, fn,*args)

@bergus
Copy link

bergus commented Mar 3, 2020

It might be trivial, but appears to be useful enough to become part of trio itself. Also I would imagine trio to be able to optimise this better, and not to require writing it as an async function. I don't want to await nursery.start_now(fn), as it doesn't wait for something asynchronous, and I wouldn't expect it to checkpoint either.

@njsmith
Copy link
Member

njsmith commented Mar 19, 2020

@bergus Would you expect nursery.start_now(fn) to execute the new task until the first checkpoint before start_now returned, or it would it only do that later (but before any other calls to start_now on the same nursery)? The former seems more intuitive but it would require an await.

In any case, I think giving "the first checkpoint" special barrier semantics would be fragile and error-prone: what if you need to refactor and now your code does two checkpoints while grabbing the fifo lock? With start_now you'd have to go rewrite all the callers, because the location of the first checkpoint becomes part of your function's public API; with task_status.started() you just put it in the appropriate place. (And what if your lock.acquire() method internally does two checkpoints? Right now that's invisible to callers, but if we added start_now existed then it would become part of acquire's public semantics.)

Writing task_status.started() does need an extra line, but the extra line isn't just clutter, it serves a purpose: it acts as an explicit, visible marker in your source code of where the function is considered "started". So I don't think we want to get rid of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants