Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Collation fetching times out too often #3748

Closed
eskimor opened this issue Aug 31, 2021 · 4 comments
Closed

Collation fetching times out too often #3748

eskimor opened this issue Aug 31, 2021 · 4 comments
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@eskimor
Copy link
Member

eskimor commented Aug 31, 2021

Note: I am talking here about the soft timeout in collation fetching, which triggers another parallel download. Although we got reports about the network timeout to also hit some parachains.

See also #3230 and #3741

We should investigate, why collation fetching is taking way longer than expected. One simple reason that comes to mind, is that multiple validators all request from the same collator, thus their requests will be queued, together with multiple heads this could easily lead to several hundreds of milliseconds.

Assuming that there is only a single collator having the collation, the current behavior could actually be fine, as there is no way to improve throughput. In case, there is another collator, it would be way better for the collator to immediately cancel incoming requests once it has 1 queued already (queue size one), so the validator can immediately move on the the next collator, not wasting any time.

For this to work properly, we not only need to set queue size to 1, but also change behavior of validators to not change the collators reputation in the event of a single cancel, as this is now expected behavior - see #3230 .

@eskimor eskimor added the I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Aug 31, 2021
@rphmeier
Copy link
Contributor

Note: another way to reduce load on a single collator is to reduce the backing group size.

@bkchr
Copy link
Member

bkchr commented Sep 9, 2021

Assuming that there is only a single collator having the collation, the current behavior could actually be fine, as there is no way to improve throughput. In case, there is another collator, it would be way better for the collator to immediately cancel incoming requests once it has 1 queued already (queue size one), so the validator can immediately move on the the next collator, not wasting any time.

No other collator has the same collation.

@rphmeier
Copy link
Contributor

related: paritytech/polkadot-sdk#968

@eskimor
Copy link
Member Author

eskimor commented Nov 16, 2021

Parallel fetch timeout was a wrong log message - closing.

@eskimor eskimor closed this as completed Nov 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
None yet
Development

No branches or pull requests

3 participants