-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JoinAll blocks on tokio #2140
Comments
Hmm, looking at the implementation of As for why it hangs with the new |
Yes, there seem indeed to be 2 issues here. I didn't really know whether to report here or at tokio. In tests I did, it didn't happen with threadpool. I will add some logging to localset and look at what's looping, since as the cpu keeps working, it's doing something. However, it only happens from when budget was introduced, I bisected it. |
@najamelan thanks for looking into this, it would be really helpful to know if the issue can be reproduced with just |
FWIW, I haven't been able to get the issue to occur without |
Hmm, if I replace use futures::stream::{futures_unordered::FuturesUnordered, StreamExt};
use tokio::{runtime::Builder, task::LocalSet};
fn main() {
let mut exec = Builder::new().basic_scheduler().build().unwrap();
let set = LocalSet::new();
let mut handles = FuturesUnordered::new();
for i in 1..=128 {
handles.push(set.spawn_local(async move {
tokio::task::spawn_local(async move {
println!("Running inner {}", i);
})
.await
.unwrap();
println!("Running {}", i);
}));
}
exec.block_on(set.run_until(handles.for_each(|_| async { () })));
} This code runs to completion. |
Here is a flamegraph of it looping. From what I can tell, it's just polling the JoinAll over and over, and that always returns Pending, because there is always futures that are pending. It seems they are stuck on the inner task, but that has already printed to the console and really has nothing more to do. Why this happens is not yet clear to me and how the budget interferes with that: |
Oh yes, of course, it is because the joinall holds the JoinHandles, which don't really poll the actual inner tasks, so it is just LocalSet never scheduling them again. In principle they should have never been pending, because they don't await anything, but somehow the budget must cause it to not consider them complete and never schedule them again. |
Ok, I'm starting to see a pattern. Without a budget this happens:
Here there is no budget interfering, because it's 128, and we never went so long without Pending. Now with 127 tasks, things change quite a bit:
Step one is the same. It runs through all outers, all inner joinhandles return pending. But then, it starts interleaving them with the inners always running before the outers, but the JoinHandles still return Pending. Except a few. Overall, it takes 6 runs through the LocalSet for it to complete, with more and more JoinHandles gradually returning Ready, even though theoretically it could be done in 2. Looks like the interleaving is the effect of the budget. Now with 128 and over:
The JoinHandles never declare Ready. Eternally Pending and it just keeps looping over the outers. It looks like the underlying issue is an inefficiency in the synchronization of the JoinHandles, with the budget applying the finishing blow. Im not familiar with the internals of the tokio schedulers, so I'm not sure what exactly happens. |
The code for that produced the logs: use
{
tokio :: { task::*, runtime::Builder } ,
futures :: { future::join_all } ,
std :: { future::*, task::*, pin::* } ,
};
fn main()
{
let mut exec = Builder::new().basic_scheduler().build().unwrap();
let set = LocalSet::new();
let mut handles = Vec::new();
for i in 1..=5
{
handles.push( set.spawn_local( Outer{i, inner: None} ));
}
exec.block_on( set.run_until( join_all( handles ) ) );
}
struct Outer
{
i: usize,
inner: Option<JoinHandle<()>>,
}
struct Inner
{
i: usize,
}
impl Future for Inner
{
type Output = ();
fn poll( self: Pin< &mut Self>, _cx: &mut Context<'_> ) -> Poll<()>
{
println!( "Polling inner {}", self.i );
Poll::Ready(())
}
}
impl Future for Outer
{
type Output = ();
fn poll( mut self: Pin< &mut Self>, cx: &mut Context<'_> ) -> Poll<()>
{
println!( "Outer::poll {}", self.i );
let mut handle = self.inner.take().unwrap_or_else( ||
{
tokio::task::spawn_local( Inner{i: self.i} )
});
match Pin::new( &mut handle ).poll(cx)
{
Poll::Pending =>
{
println!( "Outer::poll - inner returned pending {}", self.i );
self.inner = handle.into();
return Poll::Pending
}
Poll::Ready(_) => {}
}
println!( "Running {}", self.i );
Poll::Ready(())
}
} |
Closing in favor of the tokio issue. I will open a new issue for questioning the implementation of |
Related to #2047, #2130.
JoinAll
hangs (consuming 100% CPU) in some situations with 128 tasks or more on tokioLocalSet
. Since tokio 0.2.14 (tokio-rs/tokio#2160). I don't know the tokio executor internals very well, so I have a hard time telling whether this is the fault ofJoinAll
or something in the budget system tokio introduced, but in any caseJoinAll
will not yield if it can make progress... so it seems here the scheduler doesn't manage to wake it up anymore. Only the prints from inner will show. At 127, everything runs to completion.@jonhoo do you see what's going on here?
The text was updated successfully, but these errors were encountered: