Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Upgradeable validation functions #918

Merged
merged 33 commits into from
Apr 6, 2020
Merged

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented Mar 19, 2020

cc @bkchr @coriolinus

Closes #825

This uses a time-limit approach to allowing parachains to upgrade their validation code, as opposed to a fees-based approach as outlined in the issue. The most important changes are in polkadot-runtime-common and summarized here:

  • Track times in history when code was replaced for parachains and use that to determine the correct code
  • Replaced code is kept in the relay-chain state for a configured time-period.
  • Clean up replaced code after a configured delay
  • Parachains, when cleaned up, have their validation code placed into the state and cleaned up after the same delay.
  • Parachains cannot apply code changes more than once per configured delay.
  • Pending code is not applied until after a configured delay

Other miscellaneous changes:

  • Extracted certain types to the parachain crate as mentioned in Add current relay block to ValidationParams #879 . These types are re-exported via primitives/parachain.
  • Update the parachain-wasm interface to allow for code changes
  • Implement a test parachain which does code changes and test it. This example logic can be used as a basis for Substrate implementation

@rphmeier rphmeier added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 19, 2020
@rphmeier rphmeier added A0-please_review Pull request needs code review. B2-breaksapi and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Mar 20, 2020
@rphmeier rphmeier marked this pull request as ready for review March 20, 2020 22:55
@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 22, 2020

cc @arkpar Any idea why the wasm-executor test seems to be failing? I haven't changed this logic but did shuffle some crates around

@ordian

This comment has been minimized.

@arkpar
Copy link
Member

arkpar commented Mar 23, 2020

Not sure how the test failure could have happened. Unless CI machine was paused for a few seconds in the middle of the the test execution, it should timeout external validation function execution. That timeout was triggered too late apparently. It worked after restart. Please let me know if this happens again.

Copy link
Contributor

@coriolinus coriolinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel competent to review this PR as a whole; there's a lot of changes in here about which I just don't yet know enough to predict the resulting changes in behavior. However, from the narrow perspective of enabling me to handle parachain validation function changes within cumulus, this definitely appears to work.

Just ensuring that I understand the intention behind this design, here's the happy path as I understand it:

  • a superuser (or, todo, referendum) approves a new validation function
  • a support pallet (in progress) looks at validation_params.code_upgrade_allowed and keeps track of the relay chain block number at which an upgrade would be legal
  • The first time that validation_params.relay_chain_height >= upgrade_block, it returns the new function in ValidationResult, which notifies polkadot that the new function is ready to adopt. At all other times, ValidationResult.new_validation_code == None.

Is that correct?

Comment on lines +191 to +193
#[derive(PartialEq, Eq, Decode)]
#[cfg_attr(feature = "std", derive(Debug, Encode))]
pub struct ValidationParams {
Copy link
Contributor

@coriolinus coriolinus Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(PartialEq, Eq, Decode)]
#[cfg_attr(feature = "std", derive(Debug, Encode))]
pub struct ValidationParams {
#[derive(PartialEq, Eq, Encode, Decode)]
#[cfg_attr(feature = "std", derive(Debug))]
pub struct ValidationParams {

It would be helpful if coding on this struct were always bidirectional: that would let me encode it into storage (under a constant key) so that pallets could publish things like the max code size and whether or not code upgrades are currently allowed.

Alternate suggestion, in case BlockData or HeadData are large: factor out something like

#[derive(PartialEq, Eq, Encode, Decode, Clone, Copy)]
#[cfg_attr(feature = "std", derive(Debug))]
pub struct ValidationFunctionParams {
	/// The maximum code size permitted, in bytes.
	pub max_code_size: u32,
	/// The current relay-chain block number.
	pub relay_chain_height: RelayChainBlockNumber,
	/// Whether a code upgrade is allowed or not, and at which height the upgrade
	/// would be applied after, if so. The parachain logic should apply any upgrade
	/// issued in this block after the first block
	/// with `relay_chain_height` at least this value, if `Some`. if `None`, issue
	/// no upgrade.
	pub code_upgrade_allowed: Option<NonZero<RelayChainBlockNumber>>,
}

That's all the data I'm really interested in forwarding for pallet use; this struct could then become a public member of ValidationParams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if coding on this struct were always bidirectional: that would let me encode it into storage (under a constant key) so that pallets could publish things like the max code size and whether or not code upgrades are currently allowed.

Yes, but at the cost of larger runtime code size. You can build it with the std feature to get Encode functionality.

Copy link
Contributor

@coriolinus coriolinus Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be open to the alternate suggestion, breaking out ValidationFunctionParams? I'm already doing that in the Cumulus branch; it's not hard, but it would be easier to just copy out the struct directly.

https://github.com/paritytech/cumulus/blob/97b96a9802d290dadb5b3e7fe7c4a7daab0bb627/runtime/src/lib.rs#L95-L122

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stuff should be kept locally to Cumulus.

Copy link
Contributor Author

@rphmeier rphmeier Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why a Cumulus chain would want to put this in storage? These parameters will change for the execution of every block and cannot be relied on beyond a single block's execution.

The only thing a Cumulus should need to store to process a pending upgrade is what the inner value of code_upgrade_allowed: Some was when the upgrade was initiated. Then the next time you get a ValidationFunctionParams with relay_chain_height >= scheduled_at, you apply the code change at the end of the block by writing to :code storage.

Is it a question of putting it in the ephemeral storage, which is wiped at the end of the block, so other modules can query this value? I think you only need to put code_upgrade_allowed in such ephemeral storage.

And I'll go a little off topic here, but I wrote this code with the understanding that Cumulus would have an inherent type with the ValidationFunctionParams which is required in each block. Then the validation function is checking that the passed function params are equal to the ones in the inherent. This means that you can get this data into the ephemeral store by implementing a normal ensure_none Call. Is this in line with your thinking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why a Cumulus chain would want to put this in storage?

The main goal is to propagate the values beyond Cumulus, so that a pallet can use them. It enables the pallet to publish methods like can_set_code() -> bool or max_code_size() -> u32, but more importantly, it lets the pallet generate nice errors like this:

pub fn set_code(origin, validation_function: ValidationFunction) {
    let vfp = Self::validation_function_params();
    ensure!(validation_function.len() <= vfp.max_code_size as usize, Error::<T>::TooBig);
    let apply_block = vfp.code_upgrade_allowed.ok_or(Error::<T>::ProhibitedByPolkadot)?;
    ...
}

To a user, that's a much nicer way to know that this is the wrong time or the wrong size for a validation function upgrade than panicing during the Cumulus validate_block function.

I spoke with @bkchr earlier this afternoon, and he explained why literally putting it into storage is a bad idea (it messes with the storage root). We came up with workaround which still ends up going through the storage interface, even though it doesn't actually modify the storage; for that reason, it would be still good to derive Encode, Decode on the struct.

Is it a question of putting it in the ephemeral storage... I think you only need to put code_upgrade_allowed in such ephemeral storage.

Essentially, yes, that's why we're using it. We do use all three fields, though:

  • max_code_size: so that we can return an error instead of panicing if submitted code is too big
  • relay_chain_height: so we know when to set :code given a previously scheduled validation function upgrade
  • code_upgrade_allowed: so we know when to schedule a validation function upgrade

I wrote this code with the understanding that Cumulus would have an inherent type with the ValidationFunctionParams

I'm also looking into how to implement that. However, even if the upgrade pallet gets the validation function params from an inherent instead of by directly querying a special storage key, it's not yet clear to me how the inherent would get access to them except via the magic storage key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally: it would be helpful to add a field to ValidationParams / ValidationFunctionParams. It could be as simple as

expect_upgrade_this_block: bool,

Rationale: we can't use code_upgrade_allowed to validate all inserts to :code. Obviously the upgrade pallet will be well-behaved in this regard, but not all users will discover and use that pallet, so we should guard against errors in DIY user code.

Scenario:

  • Block 1000: can_set_code == Some(1500). User sets some code, which is returned in the ValidationResult.
  • Blocks 1001-1499: can_set_code == None
  • Block 1500: can_set_code == None || can_set_code == Some(2000). I don't know which condition actually applies, but neither value indicates that it is legal to set :code on this block, right now.

Given expect_upgrade_this_block, we can inspect all writes to :code and ensure that they are only performed when legal. Without it, we have to either store the scheduled upgrade block within validate_block (which breaks the storage root), or skip the check entirely.

Copy link
Contributor Author

@rphmeier rphmeier Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to address these points one-by-one.

Essentially, yes, that's why we're using it. We do use all three fields, though:

  • max_code_size: so that we can return an error instead of panicing if submitted code is too big
  • relay_chain_height: so we know when to set :code given a previously scheduled validation function upgrade
  • code_upgrade_allowed: so we know when to schedule a validation function upgrade

I would still recommend to create a separate type for these fields, as we will add more stuff to ValidationFunctionParams including incoming messages and other fields you would not want to necessarily include in the ephemeral storage.

Additionally: it would be helpful to add a field to ValidationParams / ValidationFunctionParams

expect_upgrade_this_block is redundant which could be a cause of confusion or bugs later on. I would prefer to avoid it. Specifically, this statement above is not correct:

Block 1500: can_set_code == None || can_set_code == Some(2000). I don't know which condition actually applies, but neither value indicates that it is legal to set :code on this block, right now.

When can_set_code is Some, you can schedule an upgrade to be applied at/after that block. When an upgrade is scheduled, you should put it into code as soon as relay_chain_height >= scheduled_at, regardless of value of can_set_code. Happy to bikeshed on names for can_set_code as I see where the confusion stems from. The implementation accounts for the case where an upgrade is applied and a new one is signaled at the same time here: https://github.com/paritytech/polkadot/pull/918/files#diff-e21f8096708b22fcdbd66ebc74f0821aR1038

I wrote this code with the understanding that Cumulus would have an inherent type with the ValidationFunctionParams

I'm also looking into how to implement that. However, even if the upgrade pallet gets the validation function params from an inherent instead of by directly querying a special storage key, it's not yet clear to me how the inherent would get access to them except via the magic storage key.

The inherent should provide the value for the ephemeral magic storage key, right? And then the inherent is constructed via a ProvideInherent implementation, whose InherentData contains a ValidationFunctionParams which is provided by the polkadot-collator pipeline to Cumulus' authorship process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can_set_code is Some, you can schedule an upgrade to be applied at/after that block. When an upgrade is scheduled, you should put it into code as soon as relay_chain_height >= scheduled_at, regardless of value of can_set_code.

Yes, I understand that; that's what the pallet does. However, there are three areas in the code in play:

  • Polkadot
  • cumulus runtime (validate_block)
  • pallet-parachain-upgrade

When an upgrade is scheduled, both polkadot and the pallet store the current value of can_set_code, and apply the code change at the appropriate time. However, the cumulus runtime is stateless; it doesn't get to add anything persistent to storage.

That's all fine, assuming that the only way that users ever attempt parachain upgrades is via our pallet. However, that's not an entirely reasonable assumption. Someone out there, eventually, is likely to want to do it themselves for some reason.

The purpose of expect_upgrade_this_block is to enable validate_block to know when an upgrade is legal, given that it can't store its own data in storage, and shouldn't peek into the pallet's storage (because we can't assume that our pallet will be used). If it exists, then validate_block can fail when a user attempts to set :code at the wrong time. If it doesn't exist, then it cannot perform that validation. If that validation is already performed by polkadot, then we can skip this. However, if Polkadot is not performing that validation check, then it is possible for an end user to write a pallet which just updates :code whenever it feels like.

And then the inherent is constructed via a ProvideInherent implementation, whose InherentData contains a ValidationFunctionParams which is provided by the polkadot-collator pipeline to Cumulus' authorship process.

Thanks! This gives me a lot of entrypoints to research to figure out how to do this.

Copy link
Contributor Author

@rphmeier rphmeier Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of ways someone can write an incorrect parachain or runtime. Even if we added this expect_upgrade_this_block, there is still a possibility that someone writes a pallet that doesn't care about can_upgrade_at and writes to :code anyway. Or writes the wrong code to :code. Or doesn't write anything to:code when it's supposed to. There are a million ways an end user can mess things up, which we cannot prevent. We can't address these issues, so are we supposed to throw away this PR?

This PR sets forth a protocol that chains can follow. Cumulus is one way for them to follow the protocol. If someone decides to register a parachain diverging from the protocol, then their parachain will not work.

As long as Cumulus can expose a reasonable entry-point for users to perform parachain code upgrades, then responsible parachain authors and auditors will (need to) ensure that it is used correctly.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 28, 2020

I don't feel competent to review this PR as a whole; there's a lot of changes in here about which I just don't yet know enough to predict the resulting changes in behavior. However, from the narrow perspective of enabling me to handle parachain validation function changes within cumulus, this definitely appears to work.

Just ensuring that I understand the intention behind this design, here's the happy path as I understand it:

  • a superuser (or, todo, referendum) approves a new validation function
  • a support pallet (in progress) looks at validation_params.code_upgrade_allowed and keeps track of the relay chain block number at which an upgrade would be legal
  • The first time that validation_params.relay_chain_height >= upgrade_block, it returns the new function in ValidationResult, which notifies polkadot that the new function is ready to adopt. At all other times, ValidationResult.new_validation_code == None.

Is that correct?

The process is more like this:

@rphmeier
Copy link
Contributor Author

cc @arkpar I've encountered the same spurious test failure of parallel_execution again.

@rphmeier
Copy link
Contributor Author

rphmeier commented Apr 3, 2020

If #973 passes CI then we need to merge #972 ASAP and then merge this on top of it.

@rphmeier rphmeier added A8-looksgood and removed A0-please_review Pull request needs code review. labels Apr 4, 2020

Self::do_old_code_pruning(now);

// TODO [now]: adjust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs should be fixed before merging or have an issue opened?

@rphmeier rphmeier merged commit ed2c4ca into master Apr 6, 2020
@rphmeier rphmeier deleted the rh-upgradeable-validation-function branch April 6, 2020 14:43
coriolinus pushed a commit that referenced this pull request Apr 14, 2020
* upgrade primitives to allow changing validation function

* set up storage schema for old parachains code

* fix compilation errors

* fix test compilation

* add some tests for past code meta

* most of the runtime logic for code upgrades

* implement old-code pruning

* add a couple tests

* clean up remaining TODOs

* add a whole bunch of tests for runtime functionality

* remove unused function

* fix runtime compilation

* extract some primitives to parachain crate

* add validation-code upgrades to validation params and result

* extend validation params with code upgrade fields

* provide maximums to validation params

* port test-parachains

* add a code-upgrader test-parachain and tests

* fix collator tests

* move test-parachains to own folder to work around compilation errors

* fix test compilation

* update the Cargo.lock

* fix parachains tests

* remove dbg! invocation

* use new pool in code-upgrader

* bump lockfile

* link TODO to issue
HCastano added a commit to HCastano/polkadot that referenced this pull request Apr 27, 2021
f95cc7a5 Merge branch 'master' into hc-add-wococo-support
a13ee0bc Bump Substrate (paritytech#939)
f8680cbf jsonrpsee alpha6 (paritytech#938)
6163bcbf reonnect to failed client in on-demand relay background task (paritytech#936)
14e82bea Do not spawn additional task for on-demand relays (paritytech#933)
b1557b88 Relay at least one header for every source chain session (paritytech#923)
9420649c Remove deprecated Runtime Header APIs (paritytech#932)
9627011e Update README.md (paritytech#931)
7b736b9c Truncate output in logs. (paritytech#930)
faad06e3 Make sure that relayers have dates in logs. (paritytech#927)
07734535 Update dump-logs script. (paritytech#928)
efe215e4 RustFmt
02522249 Fix test
48b41d82 Add support for relaying headers between Rococo and Wococo
f16b6b41 Add CLI support for initializing the Wococo<>Rococo bridge
8c6e6443 Add more Wococo boilerplate code
c2d56b2e Add pruning to bechmarks & update weights. (paritytech#918)
a30c51dc Add properties to Chain Spec (paritytech#917)
28d3ed2f Add Wococo primitives crate
d691c73e Fix issue with on-demand headers relay not starting (paritytech#921)
8ee55c1e Fix image publishing. (paritytech#922)
f51fb59d Prefix in relay loops logs (paritytech#920)

git-subtree-dir: bridges
git-subtree-split: f95cc7a57b48948f17d33f5be3ea01c752deba94
HCastano added a commit that referenced this pull request Apr 29, 2021
801c99f3 Add Wococo<>Rococo Header Relayer (#925)
21f49051 Remove Westend<>Rococo header sync (#940)
06235f16 do not panic if pallet is not yet initialized (#937)
a13ee0bc Bump Substrate (#939)
f8680cbf jsonrpsee alpha6 (#938)
6163bcbf reonnect to failed client in on-demand relay background task (#936)
14e82bea Do not spawn additional task for on-demand relays (#933)
b1557b88 Relay at least one header for every source chain session (#923)
9420649c Remove deprecated Runtime Header APIs (#932)
9627011e Update README.md (#931)
7b736b9c Truncate output in logs. (#930)
faad06e3 Make sure that relayers have dates in logs. (#927)
07734535 Update dump-logs script. (#928)
c2d56b2e Add pruning to bechmarks & update weights. (#918)
a30c51dc Add properties to Chain Spec (#917)
d691c73e Fix issue with on-demand headers relay not starting (#921)
8ee55c1e Fix image publishing. (#922)
f51fb59d Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: 801c99f3de0fa4d0b61e4e065fa30817179368ea
tomusdrw added a commit that referenced this pull request May 3, 2021
f43c92430 Fix account derivation in CLI (#952)
9ac07e733 Add backbone configuration of cargo-spellcheck (#924)
2761c3fef Message dispatch support multiple instances (#942)
801c99f3d Add Wococo<>Rococo Header Relayer (#925)
21f490514 Remove Westend<>Rococo header sync (#940)
06235f162 do not panic if pallet is not yet initialized (#937)
a13ee0bc3 Bump Substrate (#939)
f8680cbfc jsonrpsee alpha6 (#938)
6163bcbf4 reonnect to failed client in on-demand relay background task (#936)
14e82bea3 Do not spawn additional task for on-demand relays (#933)
b1557b882 Relay at least one header for every source chain session (#923)
9420649c1 Remove deprecated Runtime Header APIs (#932)
9627011e1 Update README.md (#931)
7b736b9cc Truncate output in logs. (#930)
faad06e39 Make sure that relayers have dates in logs. (#927)
077345351 Update dump-logs script. (#928)
c2d56b2e9 Add pruning to bechmarks & update weights. (#918)
a30c51dc9 Add properties to Chain Spec (#917)
d691c73e9 Fix issue with on-demand headers relay not starting (#921)
8ee55c1e1 Fix image publishing. (#922)
f51fb59d0 Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: f43c924301c227d29ec161f6815d9bac458a211d
HCastano added a commit that referenced this pull request May 4, 2021
b2099c5c Bump Substrate to `b094edaf` (#958)
3f037094 Bump endowment amounts on Rialto and Millau (#957)
b21fd07c Bump Substrate WASM builder (#947)
30ccd07c Bump Substrate to `ec180313` (#955)
a7422ab1 Upgrade to GitHub-native Dependabot (#945)
ed20ef34 Move pallet-bridge-dispatch types to primitives (#948)
2070c4d6 Endow accounts and add `bridgeIds` to chainspec. (#951)
f43c9243 Fix account derivation in CLI (#952)
9ac07e73 Add backbone configuration of cargo-spellcheck (#924)
2761c3fe Message dispatch support multiple instances (#942)
801c99f3 Add Wococo<>Rococo Header Relayer (#925)
21f49051 Remove Westend<>Rococo header sync (#940)
06235f16 do not panic if pallet is not yet initialized (#937)
a13ee0bc Bump Substrate (#939)
f8680cbf jsonrpsee alpha6 (#938)
6163bcbf reonnect to failed client in on-demand relay background task (#936)
14e82bea Do not spawn additional task for on-demand relays (#933)
b1557b88 Relay at least one header for every source chain session (#923)
9420649c Remove deprecated Runtime Header APIs (#932)
9627011e Update README.md (#931)
7b736b9c Truncate output in logs. (#930)
faad06e3 Make sure that relayers have dates in logs. (#927)
07734535 Update dump-logs script. (#928)
c2d56b2e Add pruning to bechmarks & update weights. (#918)
a30c51dc Add properties to Chain Spec (#917)
d691c73e Fix issue with on-demand headers relay not starting (#921)
8ee55c1e Fix image publishing. (#922)
f51fb59d Prefix in relay loops logs (#920)

git-subtree-dir: bridges
git-subtree-split: b2099c5c0baf569e2ec7228507b6e4f3972143cc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgradeable validation functions
6 participants