Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polkadot and Kusama have a lot of failures during warp sync #2975

Closed
tomaka opened this issue Nov 7, 2022 · 4 comments · Fixed by #3018
Closed

Polkadot and Kusama have a lot of failures during warp sync #2975

tomaka opened this issue Nov 7, 2022 · 4 comments · Fixed by #3018

Comments

@tomaka
Copy link
Contributor

tomaka commented Nov 7, 2022

[15:40:16.566] [runtime-polkadot] Failed to download :code and :heappages of blocks 0x6152…8f39: Storage query errors:
- Response decoding error: ProtobufDecode
- Response decoding error: ProtobufDecode
- Response decoding error: ProtobufDecode
@tomaka
Copy link
Contributor Author

tomaka commented Nov 7, 2022

Interestingly, the warp syncing ends up working after some time. Maybe because we end up querying a node on an older Substrate version or something.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 8, 2022

The responses are simply empty. The remote seems to send back a 0 (as the length) then close the substream.

I think this might have to do with overly-busy nodes refusing to process our requests. It's definitely not intended that they return nothing, instead they're supposed to close the substream without sending back anything.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 17, 2022

Something I forgot to mention is that this happens both before and after #2984
And before #2984, there hadn't been any meaningful change for a long time.
I have also debugged the fact that Substrate sends back a 0 and closes the substream by looking at the data received by smoldot.

So overall I'm reasonably confident that this isn't a smoldot bug.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 18, 2022

The problem is simply that we didn't backport paritytech/substrate#12727 (see also paritytech/substrate#12732)

Diff that solves the problem:

diff --git a/src/network/protocol/storage_call_proof.rs b/src/network/protocol/storage_call_proof.rs
index ab044629..e65b807d 100644
--- a/src/network/protocol/storage_call_proof.rs
+++ b/src/network/protocol/storage_call_proof.rs
@@ -94,30 +94,36 @@ pub fn build_call_proof_request<'a>(
 
 /// Decodes a response to a storage proof request or a call proof request.
 ///
-/// On success, contains a list of Merkle proof entries.
+/// On success, contains a list of Merkle proof entries, or `None` if the remote couldn't answer
+/// the request.
 pub fn decode_storage_or_call_proof_response(
     ty: StorageOrCallProof,
     response_bytes: &[u8],
-) -> Result<Vec<&[u8]>, DecodeStorageCallProofResponseError> {
+) -> Result<Option<Vec<&[u8]>>, DecodeStorageCallProofResponseError> {
     let field_num = match ty {
         StorageOrCallProof::CallProof => 1,
         StorageOrCallProof::StorageProof => 2,
     };
 
+    // TODO: while the `proof` field is correctly optional, the `response` field isn't supposed to be optional; make it `#[required]` again once https://github.com/paritytech/substrate/pull/12732 has been merged and released
+
     let mut parser = nom::combinator::all_consuming::<_, _, nom::error::Error<&[u8]>, _>(
         nom::combinator::complete(protobuf::message_decode! {
-            #[required] response = field_num => protobuf::message_tag_decode(protobuf::message_decode!{
-                #[required] proof = 2 => protobuf::bytes_tag_decode
+            #[optional] response = field_num => protobuf::message_tag_decode(protobuf::message_decode!{
+                #[optional] proof = 2 => protobuf::bytes_tag_decode
             }),
         }),
     );
 
-    let proof: &[u8] = match nom::Finish::finish(parser(response_bytes)) {
-        Ok((_, out)) => out.response.proof,
+    let proof: Option<&[u8]> = match nom::Finish::finish(parser(response_bytes)) {
+        Ok((_, out)) => out.response.and_then(|r| r.proof),
         Err(_) => return Err(DecodeStorageCallProofResponseError::ProtobufDecode),
     };
 
+    let Some(proof) = proof else { return Ok(None) };
+
     // The proof itself is a SCALE-encoded `Vec<Vec<u8>>`.
+
     let (_, decoded) = nom::combinator::all_consuming(nom::combinator::flat_map(
         crate::util::nom_scale_compact_usize,
         |num_elems| nom::multi::many_m_n(num_elems, num_elems, crate::util::nom_bytes_decode),
@@ -126,7 +132,7 @@ pub fn decode_storage_or_call_proof_response(
         DecodeStorageCallProofResponseError::ProofDecodeError
     })?;
 
-    Ok(decoded)
+    Ok(Some(decoded))
 }
 
 /// Error potentially returned by [`decode_storage_or_call_proof_response`].

I'm not opening a PR right now because it conflicts with #3013.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant