Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic scaling: runtime v2 descriptor support #5423

Open
wants to merge 109 commits into
base: master
Choose a base branch
from

Conversation

sandreim
Copy link
Contributor

@sandreim sandreim commented Aug 20, 2024

Closes #5045 and #5046

On top of #5362

TODO:

  • storage migration for allowed relay parents tracker
  • check session index
  • PRdoc
  • tests
  • ensure UMP queue cannot be abused with this change
  • Zombienet runtime upgrade test

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 1/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7276883

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim sandreim added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Sep 6, 2024
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Have you tested the runtime migration with an upgrade in zombienet?
I've always discovered bugs by doing that in the past 😆

polkadot/primitives/src/vstaging/mod.rs Show resolved Hide resolved
polkadot/primitives/src/vstaging/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/inclusion/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/shared.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/shared/migration.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/shared/migration.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/shared/migration.rs Outdated Show resolved Hide resolved
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim
Copy link
Contributor Author

Looking good! Have you tested the runtime migration with an upgrade in zombienet? I've always discovered bugs by doing that in the past 😆

Not yet, but I don't really expect any issues, it is a very simple migration.

Comment on lines +756 to +757
claim_queue: BTreeMap<CoreIndex, VecDeque<Id>>,
) -> BTreeMap<ParaId, BTreeMap<u8, BTreeSet<CoreIndex>>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some type alias should help here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree :)
Also, I find it a bit strange to have a BTreeMap<u8, BTreeSet<_>>
instead of a Vec<BTreeSet<CoreIndex>> where the index in the Vec represents the depth. Given that the depth is supposed to be a number 0..x and without gaps IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using a VecDeque initially, but switched to BTreeMap. There are 2 reasons to do it:

  • we don't need to pop like with the claim. queue
  • future proofing in case we want to support gaps for on-demand, where currently we need to double the assignment in case the core is not used at all, as the para might not have enough time to utilize it if we add it at claim queue offset 0

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim
Copy link
Contributor Author

Runtime upgrade successful in Zombienet using a Westend local chain.

@alindima alindima mentioned this pull request Sep 18, 2024
7 tasks
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall!
Left some questions for my understanding and potential corner-cases


self.selected_core().and_then(|(core_selector, _cq_offset)| {
let core_index =
**assigned_cores.get(core_selector.0 as usize % assigned_cores.len())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i didn't get from the RFC is why do we need to do % assigned_cores and not assume that the core_selector < assigned_cores.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core selector is a sequence number that wraps around. We use the LSB of block numbers to fill it in.

{
let (upward_messages, ump_signals) = upward_messages.split_at(separator_index);

if ump_signals.len() > 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 0 ump_signals allowed in case there's a separator?
is multiple separators allowed? we only look here for the last one
my thinking it should be no in both cases - might be good to validate those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is not allowed and this should be tested by the check_core_index function and selected_core should return None. I will have to double check and add another check here if this is not covered.

If we have multiple separators that is fine, it means that the parachain wants to hurt itself. Those will count as XCM messages and are limited by a param in relay chain configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it seems we are not really strict, we just consider that the candidate does not commit to a core as we are using Option. I will add stricter checks.

polkadot/runtime/parachains/src/shared.rs Show resolved Hide resolved
Comment on lines +756 to +757
claim_queue: BTreeMap<CoreIndex, VecDeque<Id>>,
) -> BTreeMap<ParaId, BTreeMap<u8, BTreeSet<CoreIndex>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree :)
Also, I find it a bit strange to have a BTreeMap<u8, BTreeSet<_>>
instead of a Vec<BTreeSet<CoreIndex>> where the index in the Vec represents the depth. Given that the depth is supposed to be a number 0..x and without gaps IIUC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic scaling: relay chain support for new candidate receipts
4 participants