Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handeling of events received in kad::Handler could be delayed or blocked #4960

Closed
stormshield-frb opened this issue Nov 29, 2023 · 4 comments

Comments

@stormshield-frb
Copy link
Contributor

Summary

When trying to understand the cause of #4948, we detected a potential bug in the implementation of Handler::poll : events received in on_behaviour_event, could be handled very late or maybe never.

The outbound_substreams field is a FuturesMap. When calling poll_unpin on a FuturesMap, if there are no more futures to be run (i.e., if the FuturesMap is empty), it returns Poll::Pending and registers a waker (self.empty_waker) which will be triggered on the next try_push call (i.e., when a new future has to be run).

However, when reading how poll_unpin is called for outbound_substreams (in Handler::poll), we can identify a "snake biting its tail" behaviour :

In the Handler::poll implementation, there is only two calls that can register a waker when returning Poll::Pending:

As we have seen, self.outbound_substreams.poll_unpin(cx) registers a waker on the next call to try_push. Unfortunately, the call to try_push depends on the call to self.queue_new_stream which is itself uniquely present in the Handler::poll method. That means that, if there are no more futures in outbound_substream and that pending_messages is also empty, the only thing waking the poll method back on is self.inbound_substreams.poll_next_unpin(cx) (i.e., when pushing to pending_messages there could be an important delay before messages are handled).

Expected behavior

Pushing to self.pending_messages should allow messages to be handled as soon as possible.

Actual behavior

Pushing to self.pending_messages do not really wake up outbound_substreams.

Relevant log output

No response

Possible Solution

I'm going to do a PR and tag this issue.

Version

Current master branch.

Would you like to work on fixing this bug ?

Yes

@mxinden
Copy link
Member

mxinden commented Dec 1, 2023

Thank you for your debugging work and the elaborate explanation.

I am still confused by one thing. Sorry in case I am missing something. It is rather late in my day.

Once the connection task called on_behaviour_event on the ConnectionHandler, it will go ahead and poll the ConnectionHandler, given that both is running in a loop.

loop {
match futures::future::select(
command_receiver.next(),
poll_fn(|cx| Pin::new(&mut connection).poll(cx)),
)
.await
{
Either::Left((Some(command), _)) => match command {
Command::NotifyHandler(event) => connection.on_behaviour_event(event),

Shouldn't this make the fix in #4961 obsolete?

@thomaseizinger
Copy link
Contributor

Good catch @mxinden! I think you are right and #4961 should indeed not be necessary.

@stormshield-frb
Copy link
Contributor Author

Yes indeed. I had not succeeded in finding where the Handler::poll was called and in what context, thanks for that @mxinden. If it is the only place, I also think that the bug can not happen with the current implementation.

As for #4961 not being necessary, indeed, since there does not seem to be a case where the bug could happen, it might not be required to do it. However, we should be careful in the future because, since the contract of the poll method is broken (a function returning Poll::Pending must also ensure that the current task is scheduled to be awoken when progress can be made), if a future change is made to this part of the codebase (like the change that caused #4860, for example), the bug could then be revealed.

I'll let you decide what you prefer, and I will understand either way 😊

@thomaseizinger
Copy link
Contributor

I'll close this as not necessary. If you want, we can merge a documentation update that explains this @stormshield-frb ! Thank you for the work! :)

@thomaseizinger thomaseizinger closed this as not planned Won't fix, can't repro, duplicate, stale Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants