-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: deadlock when sending ping #193
base: master
Are you sure you want to change the base?
Conversation
@@ -219,7 +219,7 @@ fn write_deadlock() { | |||
// fact that the sum of receive windows of all open streams can easily | |||
// be larger than the send capacity of the connection at any point in time. | |||
// Using such a low capacity here therefore yields a more reproducible test. | |||
let capacity = 1024; | |||
let capacity = 10024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes seem to make the test flaky and the deadlock happens sometimes. The test output when it happens is:
running 1 test
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: 6405a268 (Server)
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: d58957af (Client)
[2024-06-11T08:40:15Z DEBUG yamux::connection] d58957af: new outbound (Stream d58957af/1) of (Connection d58957af Client (streams 0))
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3047298023
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3182731544
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3047298023, estimated round-trip-time 322.416µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3182731544, estimated round-trip-time 564.167µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 3157641594
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 1136718128
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 3157641594, estimated round-trip-time 459.708µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 1136718128, estimated round-trip-time 145.75µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 1970012150
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 2852812461
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 1970012150, estimated round-trip-time 180.5µs
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 2852812461, estimated round-trip-time 87.458µs
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 2531287629
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 1628284672
test write_deadlock has been running for over 60 seconds
Would it also be possible to achieve the same using smaller numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
let msg = vec![1u8; 30 * 1024 * 1024];
let capacity = 102;
pub const DEFAULT_CREDIT: u32 = 25600;
const PING_INTERVAL: Duration = Duration::from_millis(500);
I can usually reproduce it at least once when running 5 times. But not when capacity is 100. Maybe it can give some idea about the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I override DEFAULT_CREDIT
and PING_INTERVAL
for this test? I changed the values where they are defined as I didn't know how to override them in the test.
@@ -409,6 +401,14 @@ impl<T: AsyncRead + AsyncWrite + Unpin> Active<T> { | |||
self.socket.start_send_unpin(frame)?; | |||
continue; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this order seem to fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it deadlock on yamux test too? If not, maybe we could add a test for this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate? I understand this test was designed specifically to address this type of deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually missed the test that was there so my comment can be ignored :)
CC @mxinden |
Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window? |
I don't know much about the internals, but my theory is that when the remote's receive window has been exhausted, we try to send a ping before reading the frame which increases our credit again. If both peers do this at the same time, the deadlock happens as shown here #193 (comment). I couldn't reproduce the deadlock anymore after this change that reads the frame before sending the ping. I apologize, but this PR is mostly empirical, I don't have enough knowledge about the system yet to make a better analysis. I hope it can be helpful somehow. |
any update on this? |
See libp2p/rust-libp2p#5410 and #112
The problem might be that we try to send a ping before reading pending frames. If the receive window is exhausted, no progress can be made.