Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/kad: debug_assert panic - expecting disconnected peer not to be connected #2092

Closed
mxinden opened this issue Jun 4, 2021 · 4 comments · Fixed by #2120
Closed
Labels

Comments

@mxinden
Copy link
Member

mxinden commented Jun 4, 2021

I triggered the panic below with https://github.com/mxinden/rust-libp2p-server/. Unable to reproduce thus far.

thread 'main' panicked at 'assertion failed: !self.connected_peers.contains(disconnected.preimage())', /home/mxinden/.cargo/git/checkouts/rust-libp2p-800b9a7842b0accc/6b2873b/protocols/kad/src/behaviour.rs:1033:33

kbucket::InsertResult::Pending { disconnected } => {
debug_assert!(!self.connected_peers.contains(disconnected.preimage()));

Is the debug_assert to strict?

@mxinden mxinden added the bug label Jun 4, 2021
@FelipeRosa
Copy link

I'm currently working on a personal project and this one pops quit often for me. I'm a bit new to libp2p in general, why is that debug_assert needed?

@mxinden
Copy link
Member Author

mxinden commented Jun 30, 2021

I'm currently working on a personal project

🎉

Thanks for following up here @FelipeRosa.

why is that debug_assert needed?

It is supposed to uphold that there is no state mismatch between routing table and connected peers tracked in the Kademlia NetworkBehaviour.

I haven't gotten around to further debugging this. In case you are able to reproduce this reliably, could you share a minimal reproducible example here @FelipeRosa?

@FelipeRosa
Copy link

Hey @mxinden,

Thanks for the quick reply.

I haven't gotten around to further debugging this. In case you are able to reproduce this reliably, could you share a minimal reproducible example here @FelipeRosa?

I'm not able to reproduce this issue consistently. I read a bit of the libp2p_kad code to try to understand where this might be coming from but I didn't quite wrap my head around it yet.

What I did try to do is create a little program to keep generating hashes that would fall into buckets with available space in them and then making get_closest_peers calls sequentially.

Something like this but with some code to wait for queries to finish:

let rand_bytes: [u8; 32] = rng.gen();
let ma = libp2p::multihash::Sha2_256::digest(&rand_bytes).as_ref().to_vec();

if let Some(bucket) = swarm.behaviour_mut().kbucket(ma.clone()) {
    if !bucket.has_pending() {
        swarm.behaviour_mut().get_closest_peers(ma);
    }
}

My guess is that by doing this I would keep having peers connecting and disconnecting to generate as many routing table states as possible in order to reach the one which causes this bug.

Do you have any suggestions for things I could try?

@izolyomi
Copy link
Contributor

izolyomi commented Jul 2, 2021

I also experienced this bug during my recent experiments with libp2p, but could not reliably reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants