From 49b56d19f5848d3a555315c8223fb8269a594b72 Mon Sep 17 00:00:00 2001
From: walnut-the-cat <122475853+walnut-the-cat@users.noreply.github.com>
Date: Fri, 26 Jul 2024 08:02:29 -0700
Subject: [PATCH] NEP-509: Stateless validation stage 0 (#509)

---
 neps/nep-0509.md | 599 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 599 insertions(+)
 create mode 100644 neps/nep-0509.md

diff --git a/neps/nep-0509.md b/neps/nep-0509.md
new file mode 100644
index 000000000..4b10254cc
--- /dev/null
+++ b/neps/nep-0509.md
@@ -0,0 +1,599 @@
+---
+NEP: 509
+Title: Stateless validation Stage 0
+Authors: Robin Cheng, Anton Puhach, Alex Logunov, Yoon Hong
+Status: Draft
+DiscussionsTo: https://docs.google.com/document/d/1C-w4FNeXl8ZMd_Z_YxOf30XA1JM6eMDp5Nf3N-zzNWU/edit?usp=sharing, https://docs.google.com/document/d/1TzMENFGYjwc2g5A3Yf4zilvBwuYJufsUQJwRjXGb9Xc/edit?usp=sharing
+Type: Protocol
+Version: 1.0.0
+Created: 2023-09-19
+LastUpdated: 2023-09-19
+---
+
+## Summary
+
+The NEP proposes an solution to achieve phase 2 of sharding (where none of the validators needs to track all shards), with stateless validation, instead of the traditionally proposed approach of fraud proof and state rollback.
+
+The fundamental idea is that validators do not need to have state locally to validate chunks. 
+
+* Under stateless validation, the responsibility of a chunk producer extends to packaging transactions and receipts and annotating them with state witnesses. This extended role will be called "chunk proposers".
+* The state witness of a chunk is defined to be a subset of the trie state, alongside its proof of inclusion in the trie, that is needed to execute a chunk. A state witness allows anyone to execute the chunk without having the state of its shard locally. 
+* Then, at each block height, validators will be randomly assigned to a shard, to validate the state witness for that shard. Once a validator receives both a chunk and its state witness, it verifies the state transition of the chunk, signs a chunk endorsement and sends it to the block producer. This is similar to, but separate from, block approvals and consensus.
+* The block producer waits for sufficient chunk endorsements before including a chunk into the block it produces, or omits the chunk if not enough endorsements arrive in time.
+
+## Motivation
+
+As phase 1 of sharding requires block producers to track all shards due to underlying security concerns, the team explored potential ways to achieve phase 2 of sharding, where none of the validators has to track all shards.
+
+The early design of phase 2 relied on the security assumption that as long as there is one honest validator or fisherman tracking a shard, the shard is secure; by doing so, it naturally relied on protocol's ability to handle challenges (when an honest validator or fisherman detects a malicious behavior and submits a proof of such), state rollbacks (when validators agree that the submitted challenge is valid), and slashing (to punish the malicious validator). While it sounds straightforward and simple on paper, the complex interactions between these abilities and the rest of the protocol led to concrete designs that were extremely complicated, involving several specific problems we still don't know how to solve.
+
+As a result, the team sought alternative approaches and concluded that stateless validation is the most realistic and promising one; the stateless validation approach does not assume the existence of a fishermen, does not rely on challenges, and never rolls back state. Instead, it relies on the assumption that a shard is secure if every single chunk in that shard is validated by a randomly sampled subset of all validators, to always produce valid chunks in the first place.
+
+## Specification
+
+### Assumptions
+
+* Not more than 1/3 of validators (by stake) is corrupted.
+* In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing)
+* State sync is enabled (so that nodes can track different shards across epochs)
+* Merkle Patricia Trie continues to be the state trie implementation
+* Congestion Control is enabled - [NEP-539](https://github.com/near/NEPs/pull/539)
+
+### Design requirements
+
+* No validator needs to track all shards.
+* Security of protocol must not degrade.
+  * Validator assignment for both chunk validation and block validation should not create any security vulnerabilities.
+* Block processing time should not take significantly more than what it takes today.
+* Any additional load on network and compute should not negatively affect existing functionalities of any node in the blockchain.
+  * The cost of additional network and compute should be acceptable.
+* Validator rewards should not be reduced.
+
+### Design before NEP-509
+
+The current high-level chunk production flow, excluding details and edge cases, is as follows:
+
+* Block producer at height `H`, `BP(H)`, produces block `B(H)` with chunks accessible to it and distributes it.
+* Chunk producer for shard `S` at height `H+1`, `CP(S, H+1)`, produces chunk `C(S, H+1)` based on `B(H)` and distributes it.
+* `BP(H+1)` collects all chunks at height `H+1` until certain timeout is reached.
+* `BP(H+1)` produces block `B(H+1)` with chunks `C(*, H+1)` accessible to it and distributes it.
+
+And the flow goes on for heights H+1, H+2, etc. The "induction base" is at genesis height, where genesis block with default chunks is accessible to everyone, so chunk producers can start right away from genesis height + 1.
+
+One can observe that there is no "chunk validation" step here. In fact, validity of chunks is implicitly guaranteed by **requirement for all block producers to track all shards**.
+To achieve phase 2 of sharding, we want to drop this requirement. For that, we propose the following changes to the flow:
+ 
+### Design after NEP-509
+
+* Chunk producer, in addition to producing a chunk, produces new `ChunkStateWitness` message. The `ChunkStateWitness` contains data which is enough to prove validity of the chunk's header that is being produced.
+  * `ChunkStateWitness` proves to anyone, including those who track only block data and no shards, that this chunk header is correct.
+  * `ChunkStateWitness` is not part of the chunk itself; it is distributed separately and is considered transient data.
+* The chunk producer distributes the `ChunkStateWitness` to a subset of **chunk validators** which are assigned for this shard. This is in addition to, and independent of, the existing chunk distribution logic (implemented by `ShardsManager`) today.
+  * Chunk Validator selection and assignment are described below.
+* A chunk validator, upon receiving a `ChunkStateWitness`, validates the state witness and determines if the chunk header is indeed correctly produced. If so, it sends a `ChunkEndorsement` to the current block producer.
+* As the existing logic is today, the block producer for this block waits until either all chunks are ready, or a timeout occurs, and then proposes a block containing whatever chunks are ready. Now, the notion of readiness here is expanded to also having more than 2/3 of chunk endorsements by stake.
+  * This means that if a chunk does not receive enough chunk endorsements by the timeout, it will not be included in the block. In other words, the block only contains chunks for which there is already a consensus of validity. **This is the key reason why we will no longer need fraud proofs / tracking all shards**.
+  * The 2/3 fraction has the denominator being the total stake assigned to validate this shard, *not* the total stake of all validators.
+* The block producer, when producing the block, additionally includes the chunk endorsements (at least 2/3 needed for each chunk) in the block's body. The validity of the block is expanded to also having valid 2/3 chunk endorsements by stake for each chunk included in the block.
+  * If a block fails validation because of not having the required chunk endorsements, it is considered a block validation failure for the purpose of Doomslug consensus, just like any other block validation failure. In other words, nodes will not apply the block on top of their blockchain, and (block) validators will not endorse the block.
+
+So the high-level specification can be described as the list of changes in the validator roles and responsibilities:
+
+* Block producers:
+  * (Same as today) Produce blocks, (new) including waiting for chunk endorsements
+  * (Same as today) Maintain chunk parts (i.e. participates in data availability based on Reed-Solomon erasure encoding)
+  * (New) No longer require tracking any shard
+  * (Same as today) Should have high barrier of entry (required stake) for security reasons, to make block double signing harder.
+* Chunk producers:
+  * (Same as today) Produce chunks
+  * (New) Produces and distributes state witnesses to chunk validators
+  * (Same as today) Must track the shard it produces the chunk for
+* Block validators:
+  * (Same as today) Validate blocks, (new) including verifying chunk endorsements
+  * (Same as today) Vote for blocks with endorsement or skip messages
+  * (New) No longer require tracking any shard
+  * (Same as today) Must collectively have a majority of all the validator stake, for security reasons.
+  * (Same as today) Should have high barrier of entry to keep `BlockHeader` size low, because it is proportional to the total byte size of block validator signatures;
+* (New) Chunk validators:
+  * Validate state witnesses and sends chunk endorsements to block producers
+  * Do not require tracking any shard
+  * Must collectively have a majority of all the validator stake, to ensure the security of chunk validation.
+
+See the Validator Structure Change section below for more details.
+
+### Out of scope
+
+* Resharding support.
+* Data size optimizations such as compression, for both chunk data and state witnesses, except basic optimizations that are practically necessary.
+* Separation of consensus and execution, where consensus runs independently from execution, and validators asynchronously perform state transitions after the transactions are proposed on the consensus layer, for the purpose of amortizing the computation and network transfer time.
+* ZK integration.
+* Underlying data structure change (e.g. verkle tree).
+
+## Reference Implementation
+
+Here we carefully describe new structures and logic introduced, without going into too much technical details.
+
+### Validator Structure Change
+
+#### Roles
+
+Currently, there are two different types of validators. Their responsibilities are defined as in the following pseudocode:
+
+```python
+if index(validator) < 100:
+    roles(validator).append("block producer")
+roles(validator).append("chunk producer")
+```
+
+The validators are ordered by non-increasing stake in the considered epoch. Here and below by "block production" we mean both production and validation.
+
+With stateless validation, this structure must change for several reasons:
+
+* Chunk production is the most resource consuming activity.
+* *Only* chunk production needs state in memory while other responsibilities can be completed via acquiring state witness
+* Chunk production does not have to be performed by all validators. 
+
+Hence, to make transition seamless, we change the role of nodes out of top 100 to only validate chunks:
+
+```python
+if index(validator) < 100:
+    roles(validator).append("chunk producer")
+    roles(validator).append("block producer")
+roles(validator).append("chunk validator")
+```
+
+The more stake validator has, the more **heavy** work it will get assigned. We expect that validators with higher stakes have more powerful hardware. 
+With stateless validation, relative heaviness of the work changes. Comparing to the current order "block production" > "chunk production", the new order is "chunk production" > "block production" > "chunk validation".
+
+Shards are equally split among chunk producers: as in Mainnet on 12 Jun 2024 we have 6 shards, each shard would have ~16 chunk producers assigned.
+
+In the future, with increase in number of shards, we can generalise the assignment by saying that each shard should have `X` chunk producers assigned, if we have at least `X * S` validators. In such case, pseudocode for the role assignment would look as follows:
+
+```python
+if index(validator) < X * S:
+    roles(validator).append("chunk producer")
+if index(validator) < 100:
+    roles(validator).append("block producer")
+roles(validator).append("chunk validator")
+```
+
+#### Rewards
+
+Reward for each validator is defined as `total_epoch_reward * validator_relative_stake * work_quality_ratio`, where:
+
+* `total_epoch_reward` is selected so that total inflation of the token is 5% per annum;
+* `validator_relative_stake = validator_stake / total_epoch_stake`;
+* `work_quality_ratio` is the measure of the work quality from 0 to 1.
+
+So, the actual reward never exceeds total reward, and when everyone does perfect work, they are equal.
+For the context of the NEP, it is enough to assume that `work_quality_ratio = avg_{role}({role}_quality_ratio)`.
+So, if node is both a block and chunk producer, we compute quality for each role separately and then take average of them.
+  
+When epoch is finalized, all block headers in it uniquely determine who was expected to produce each block and chunk.
+Thus, if we define quality ratio for block producer as `produced_blocks/expected_blocks`, everyone is able to compute it.
+Similarly, `produced_chunks/expected_chunks` is a quality for chunk producer. 
+It is more accurate to say `included_chunks/expected_chunks`, because inclusion of chunk in block is a final decision of a block producer which defines success here.
+
+Ideally, we could compute quality for chunk validator as `produced_endorsements/expected_endorsements`. Unfortunately, we won't do it in Stage 0 because:
+
+* Mask of endorsements is not part of the block header, and it would be a significant change;
+* Block producer doesn't have to wait for all endorsements to be collected, so it could be unfair to say that endorsement was not produced if block producer just went ahead.
+
+So for now we decided to compute quality for chunk validator as ratio of `included_chunks/expected_chunks`, where we iterate over chunks which node was expected to validate.
+It has clear drawbacks though:
+
+* chunk validators are not incentivized to validate the chunks, given they will be rewarded the same in either case; 
+* if chunks are not produced at all, chunk validators will also be impacted.
+
+We plan to address them in the future releases.
+ 
+#### Kickouts
+
+In addition to that, if node performance is too poor, we want a mechanism to kick it out of the validator list, to ensure healthy protocol performance and validator rotation.
+Currently, we have a threshold for each role, and if for some role the same `{role}_quality_ratio` is lower than threshold, the node is kicked out.
+
+If we write this in pseudocode,
+
+```python
+if validator is block producer and block_producer_quality_ratio < 0.8:
+    kick out validator
+if validator is chunk producer and chunk_producer_quality_ratio < 0.8:
+    kick out validator
+```
+
+For chunk validator, we apply absolutely the same formula. However, because:
+
+* the formula doesn't count endorsements explicitly 
+* for chunk producers it kind of just makes chunk production condition stronger without adding value
+
+we apply it to nodes which **only validate chunks**. So, we add this line:
+
+```python
+if validator is only chunk validator and chunk_validator_quality_ratio < 0.8:
+    kick out validator
+```
+
+As we pointed out above, current formula `chunk_validator_quality_ratio` is problematic.
+Here it brings even a bigger issue: if chunk producers don't produce chunks, chunk validators will be kicked out as well, which impacts network stability.
+This is another reason to come up with the better formula.  
+
+#### Shard assignment
+
+As chunk producer becomes the most important role, we need to ensure that every epoch has significant amount of healthy chunk producers. 
+This is a **significant difference** with current logic, where chunk-only producers generally have low stake and their performance doesn't impact overall performance. 
+
+The most challenging part of becoming a chunk producer for a shard is to download most recent shard state within previous epoch. This is called "state sync".
+Unfortunately, as of now, state sync is centralised on published snapshots, which is a major point of failure, until we don't have decentralised state sync.
+
+Because of that, we make additional change: if node was a chunk producer for some shard in the previous epoch, and it is a chunk producer for current epoch, it will be assigned to the same shard.
+This way, we minimise number of required state syncs at each epoch.
+
+The exact algorithm needs a thorough description to satisfy different edge cases, so we will just leave a link to full explanation: https://github.com/near/nearcore/issues/11213#issuecomment-2111234940.
+ 
+### ChunkStateWitness 
+
+The full structure is described [here](https://github.com/near/nearcore/blob/b8f08d9ded5b7cbae9d73883785902b76e4626fc/core/primitives/src/stateless_validation.rs#L247).
+Let's construct it sequentially together with explaining why every field is needed. Start from simple data:
+  
+```rust
+pub struct ChunkStateWitness {
+    pub chunk_producer: AccountId,
+    pub epoch_id: EpochId,
+    /// The chunk header which this witness is proving.
+    pub chunk_header: ShardChunkHeader,
+}
+```
+
+What is needed to prove `ShardChunkHeader`?
+
+The key function we have in codebase is [validate_chunk_with_chunk_extra_and_receipts_root](https://github.com/near/nearcore/blob/c2d80742187d9b8fc1bb672f16e3d5c144722742/chain/chain/src/validate.rs#L141). 
+The main arguments there are `prev_chunk_extra: &ChunkExtra` which stands for execution result of previous chunk, and `chunk_header`.
+The most important field for `ShardChunkHeader` is `prev_state_root` - consider latest implementation `ShardChunkHeaderInnerV3`. It stands for state root resulted from updating shard for the previous block, which means applying previous chunk if there is no missing chunks.
+So, chunk validator needs some way to run transactions and receipts from the previous chunk. Let's call it a "main state transition" and add two more fields to state witness: 
+
+```rust
+    /// The base state and post-state-root of the main transition where we
+    /// apply transactions and receipts. Corresponds to the state transition
+    /// that takes us from the pre-state-root of the last new chunk of this
+    /// shard to the post-state-root of that same chunk.
+    pub main_state_transition: ChunkStateTransition,
+    /// The transactions to apply. These must be in the correct order in which
+    /// they are to be applied.
+    pub transactions: Vec<SignedTransaction>,
+```
+
+where
+
+```rust
+/// Represents the base state and the expected post-state-root of a chunk's state
+/// transition. The actual state transition itself is not included here.
+pub struct ChunkStateTransition {
+    /// The block that contains the chunk; this identifies which part of the
+    /// state transition we're talking about.
+    pub block_hash: CryptoHash,
+    /// The partial state before the state transition. This includes whatever
+    /// initial state that is necessary to compute the state transition for this
+    /// chunk. It is a list of Merkle tree nodes.
+    pub base_state: PartialState,
+    /// The expected final state root after applying the state transition.
+    pub post_state_root: CryptoHash,
+}
+```
+
+Fine, but where do we take the receipts?
+
+Receipts are internal messages, resulting from transaction execution, sent between shards, and **by default** they are not signed by anyone.
+
+However, each receipt is an execution outcome of some transaction or other parent receipt, executed in some previous chunk.
+For every chunk, we conveniently store `prev_outgoing_receipts_root` which is a Merkle hash of all receipts sent to other shards resulting by execution of this chunk. So, for every receipt, there is a proof of its generation in some parent chunk. If there are no missing chunk, then it's enough to consider chunks from previous block.
+
+So we add another field:   
+
+```rust
+    /// Non-strict superset of the receipts that must be applied, along with
+    /// information that allows these receipts to be verifiable against the 
+    /// blockchain history.
+    pub source_receipt_proofs: HashMap<ChunkHash, ReceiptProof>,
+```
+
+What about missing chunks though?
+
+Unfortunately, production and inclusion of any chunk **cannot be guaranteed**:
+
+* chunk producer may go offline;
+* chunk validators may not generate 2/3 endorsements;
+* block producer may not receive enough information to include chunk.
+
+Let's handle this case as well. 
+First, each chunk producer needs not just to prove main state transition, but also all state transitions for latest missing chunks:
+
+```rust
+    /// For each missing chunk after the last new chunk of the shard, we need
+    /// to carry out an implicit state transition. This is technically needed
+    /// to handle validator rewards distribution. This list contains one for each
+    /// such chunk, in forward chronological order.
+    ///
+    /// After these are applied as well, we should arrive at the pre-state-root
+    /// of the chunk that this witness is for.
+    pub implicit_transitions: Vec<ChunkStateTransition>,
+```
+
+Then, while our shard was missing chunks, other shards could still produce chunks, which could generate receipts targeting our shards. So, we need to extend `source_receipt_proofs`. 
+Field structure doesn't change, but we need to carefully pick range of set of source chunks, so different subsets will cover all source receipts without intersection.
+
+Let's say B2 is the block that contains the last new chunk of shard S before chunk which state transition we execute, and B1 is the block that contains the last new chunk of shard S before B2. 
+Then, we will define set of blocks B as the contiguous subsequence of blocks B1 (EXCLUSIVE) to B2 (inclusive) in this chunk's chain (i.e. the linear chain that this chunk's parent block is on). Lastly, source chunks are all chunks included in blocks from B.
+  
+The last caveat is **new** transactions introduced by chunk with `chunk_header`. As chunk header introduces `tx_root` for them, we need to check validity of this field as well.
+If we don't do it, malicious chunk producer can include invalid transaction, and if it gets its chunk endorsed, nodes which track the shard must either accept invalid transaction or refuse to process chunk, but the latter means that shard will get stuck.
+
+To validate new `tx_root`, we also need Merkle partial state to validate sender' balances, access keys, nonces, etc., which leads to two last fields to be added:
+  
+```rust
+    pub new_transactions: Vec<SignedTransaction>,
+    pub new_transactions_validation_state: PartialState,
+```
+
+The logic to produce `ChunkStateWitness` is [here](https://github.com/near/nearcore/blob/b8f08d9ded5b7cbae9d73883785902b76e4626fc/chain/client/src/stateless_validation/state_witness_producer.rs#L79). 
+Itself, it requires some minor changes to the logic of applying chunks, related to generating `ChunkStateTransition::base_state`.
+It is controlled by [this line](https://github.com/near/nearcore/blob/dc03a34101f77a17210873c4b5be28ef23443864/chain/chain/src/runtime/mod.rs#L977), which causes all nodes read during applying chunk to be put inside `TrieRecorder`.
+After applying chunk, its contents are saved to `StateTransitionData`.
+
+The validation logic is [here](https://github.com/near/nearcore/blob/b8f08d9ded5b7cbae9d73883785902b76e4626fc/chain/client/src/stateless_validation/chunk_validator/mod.rs#L85).
+First, it performs all validation steps for which access to `ChainStore` is required, `pre_validate_chunk_state_witness` is responsible for this. It is done separately because `ChainStore` is owned by a single thread.
+Then, it spawns a thread which runs computation-heavy `validate_chunk_state_witness` which main purpose is to apply chunk based on received state transitions and verify that execution results in chunk header are correct.
+If validation is successful, `ChunkEndorsement` is sent.
+
+### ChunkEndorsement
+
+It is basically a triple of `(ChunkHash, AccountId, Signature)`.
+Receiving this message means that specific chunk validator account endorsed chunk with specific chunk hash.
+Ideally chunk validator would send chunk endorsement to just the next block producer at the same height for which chunk was produced.
+However, block at that height can be skipped and block producers at heights h+1, h+2, ... will have to pick up the chunk.
+To address that, we send `ChunkEndorsement` to all block producers at heights from `h` to `h+d-1`. We pick `d=5` as more than 5 skipped blocks in a row are very unlikely to occur.
+
+On block producer side, chunk endorsements are collected and stored in `ChunkEndorsementTracker`.
+Small **caveat** is that *sometimes* chunk endorsement may be received before chunk header which is required to understand that sender is indeed a validator of the chunk.
+Such endorsements are stored as *pending*.
+When chunk header is received, all pending endorsements are checked for validity and marked as *validated*.
+All endorsements received after that are validated right away.
+
+Finally, when block producer attempts to produce a block, in addition to checking chunk existence, it also checks that it has 2/3 endorsement stake for that chunk hash.
+To make chunk inclusion verifiable, we introduce [another version](https://github.com/near/nearcore/blob/cf2caa3513f58da8be758d1c93b0900ffd5d51d2/core/primitives/src/block_body.rs#L30) of block body `BlockBodyV2` which has new field `chunk_endorsements`.
+It is basically a `Vec<Vec<Option<Signature>>>` where element with indices `(s, i)` contains signature of i-th chunk validator for shard s if it was included and None otherwise.
+Lastly, we add condition to block validation, such that if chunk `s` was included in the block, then block body must contain 2/3 endorsements for that shard.
+
+This logic is triggered in `ChunkInclusionTracker` by methods [get_chunk_headers_ready_for_inclusion](https://github.com/near/nearcore/blob/6184e5dac45afb10a920cfa5532ce6b3c088deee/chain/client/src/chunk_inclusion_tracker.rs#L146) and couple similar ones. Number of ready chunks is returned by [num_chunk_headers_ready_for_inclusion](https://github.com/near/nearcore/blob/6184e5dac45afb10a920cfa5532ce6b3c088deee/chain/client/src/chunk_inclusion_tracker.rs#L178).
+
+### Chunk validators selection
+
+Chunk validators will be randomly assigned to validate shards, for each block (or as we may decide later, for multiple blocks in a row, if required for performance reasons). A chunk validator may be assigned multiple shards at once, if it has sufficient stake.
+
+Each chunk validator's stake is divided into "mandates". There are full and partial mandates. The number of mandates per shard is a fixed parameter and the amount of stake per mandate is dynamically computed based on this parameter and the actual stake distribution; any remaining amount smaller than a full mandate is a partial mandate. A chunk validator therefore has zero or more full mandates plus up to one partial mandate. The list of full mandates and the list of partial mandates are then separately shuffled and partitioned equally (as in, no more than one mandate in difference between any two shards) across the shards. Any mandate assigned to a shard means that the chunk validator who owns the mandate is assigned to validate that shard. Because a chunk validator may have multiple mandates, it may be assigned multiple shards to validate.
+
+For Stage 0, we select **target amount of mandates per shard** to 68, which was a [result of the latest research](https://near.zulipchat.com/#narrow/stream/407237-core.2Fstateless-validation/topic/validator.20seat.20assignment/near/435252304).
+With this number of mandates per shard and 6 shards, we predict the protocol to be secure for 40 years at 90% confidence.
+Based on target number of mandates and total chunk validators stake, [here](https://github.com/near/nearcore/blob/696190b150dd2347f9f042fa99b844b67c8001d8/core/primitives/src/validator_mandates/mod.rs#L76) we compute price of a single full mandate for each new epoch using binary search.
+All the mandates are stored in new version of `EpochInfo` `EpochInfoV4` in [validator_mandates](https://github.com/near/nearcore/blob/164b7a367623eb651914eeaf1cbf3579c107c22d/core/primitives/src/epoch_manager.rs#L775) field.
+
+After that, for each height in the epoch, [EpochInfo::sample_chunk_validators](https://github.com/near/nearcore/blob/164b7a367623eb651914eeaf1cbf3579c107c22d/core/primitives/src/epoch_manager.rs#L1224) is called to return `ChunkValidatorStakeAssignment`. It is `Vec<Vec<(ValidatorId, Balance)>>` where s-th element corresponds to s-th shard in the epoch, contains ids of all chunk validator for that height and shard, alongside with its total mandate stake assigned to that shard.
+`sample_chunk_validators` basically just shuffles `validator_mandates` among shards using height-specific seed. If there are no more than 1/3 malicious validators, then by Chernoff bound the probability that at least one shard is corrupted is small enough. **This is a reason why we can split validators among shards and still rely on basic consensus assumption**. 
+
+This way, everyone tracking block headers can compute chunk validator assignment for each height and shard. 
+
+### Size limits
+
+`ChunkStateWitness` is relatively large message. Given large number of receivers as well, its size must be strictly limited.
+If `ChunkStateWitness` for some state transition gets so uncontrollably large that it never can be handled by majority of validators, then its shard gets stuck.
+
+We try to limit the size of the `ChunkStateWitness` to 16 MiB. All the limits are described [in this section](https://github.com/near/nearcore/blob/b34db1e2281fbfe1d99a36b4a90df3fc7f5d00cb/docs/misc/state_witness_size_limits.md).
+Additionally, we have limit on currently stored partial state witnesses and chunk endorsements, because malicious chunk validators can spam these as well.
+
+## State witness size limits
+
+A number of new limits will be introduced in order to keep the size of `ChunkStateWitness` reasonable.
+`ChunkStateWitness` contains all the incoming transactions and receipts that will be processed during chunk application and in theory a single receipt could be tens of megabatytes in size. Distributing a `ChunkStateWitness` this large to all chunk validators would be troublesome, so we limit the size and number of transactions, receipts, etc. The limits aim to keep the total uncompressed size of `ChunkStateWitness` under 16 MiB.
+
+There are two types of size limits:
+
+* Hard limit - The size must be below this limit, anything else is considered invalid. This is usually used in the context of having limits for a single item.
+* Soft limit - Things are added until the limit is exceeded, after that things stop being added. The last added thing is allowed to slightly exceed the limit. This is used in the context of having limits for a list of items.
+
+The limits are:
+
+* `max_transaction_size - 1.5 MiB`
+  * All transactions must be below 1.5 MiB, otherwise they'll be considered invalid and rejected.
+* Previously was 4 MiB, now reduced to 1.5 MiB
+* `max_receipt_size - 4 MiB`:
+  * All receipts must be below 4 MiB, otherwise they'll be considered invalid and rejected.
+  * Previously there was no limit on receipt size. Set to 4 MiB, might be reduced to 1.5 MiB in the future to match the transaction limit.
+* `combined_transactions_size_limit - 4 MiB`
+  * Hard limit on total size of transactions from this and previous chunk. `ChunkStateWitness` contains transactions from two chunks, this limit applies to the sum of their sizes.
+* `new_transactions_validation_state_size_soft_limit - 500 KiB`
+  * Validating new transactions generates storage proof (recorded trie nodes), which has to be limited. Once transaction validation generates more storage proof than this limit, the chunk producer stops adding new transactions to the chunk.
+* `per_receipt_storage_proof_size_limit - 4 MB`
+  * Executing a receipt generates storage proof. A single receipt is allowed to generate at most 4 MB of storage proof. This is a hard limit, receipts which generate more than that will fail.
+* `main_storage_proof_size_soft_limit - 3 MB`
+  * This is a limit on the total size of storage proof generated by receipts in one chunk. Once receipts generate more storage proof than this limit, the chunk producer stops processing receipts and moves the rest to the delayed queue.
+  * It's a soft limit, which means that the total size of storage proof could reach 7 MB (2.99MB + one receipt which generates 4MB of storage proof)
+  * Due to implementation details it's hard to find the exact amount of storage proof generated by a receipt, so an upper bound estimation is used instead. This upper bound assumes that every removal generates additional 2000 bytes of storage proof, so receipts which perform a lot of trie removals might be limited more than theoretically applicable.
+* `outgoing_receipts_usual_size_limit - 100 KiB`
+  * Limit on the size of outgoing receipts to another shard. Needed to keep the size of `source_receipt_proofs` small.
+  * On most block heights a shard isn't allowed to send receipts larger than 100 KiB to another shard.
+* `outgoing_receipts_big_size_limit - 4.5 MiB`
+  * On every block height there's one special "allowed shard" which is allowed to send larger receipts, up to 4.5 MiB in total.
+  * A receiving shard will receive receipts from `num_shards - 1` shards using the usual limit and one shard using the big limit.
+  * The "allowed shard" is the same shard as in cross-shard congestion control. It's chosen in a round-robin fashion, at height 1 the special shard is 0, at height 2 it's 1 and so on.
+
+In total that gives 4 MiB + 500 KiB + 7MB + 5*100 KiB + 4.5 MiB ~= 16 MiB of maximum witness size. Possibly a little more on missing chunks.
+
+### New limits breaking contracts
+
+The new limits will break some existing contracts (for example, all transactions larger than 1.5 MiB). This is sad, but it's necessary. Stateless validation uses much more network bandwidth than the previous approach, as it has to send over all states on each chunk application. Because network bandwidth is limited, stateless validation
+can't support some operations that were allowed in the previous design.
+
+In the past year (31,536,000 blocks) there were only 679 transactions bigger than 1.5MiB, sent between 164 unique (sender -> receiver) pairs.
+Only 0.002% of blocks contain such transactions, so the hope is that the breakage will be minimal. Contracts generally shouldn't require more than 1.5MiB of WASM.
+
+The full list of transactions from the past year which would fail with the new limit is available here: https://gist.github.com/jancionear/4cf373aff5301a5905a5f685ff24ed6f
+Contract developers can take a look at this list and see if their contract will be affected.
+
+### Validating the limits
+
+Chunk validators have to verify that chunk producer respected all of the limits while producing the chunk. This means that validators also have to keep track of recorded storage proof by recording all trie accesses and they have to enforce the limits.
+If it turns out that some limits weren't respected, the validators will generate a different result of chunk application and they won't endorse the chunk.
+
+### Missing chunks
+
+When a shard is missing some chunks, the following chunk on that shard will receive receipts from multiple blocks. This could lead to large `source_receipt_proofs` so a mechanism is added to reduce the impact. If there are two or more missing chunks in a row,
+the shard is considered fully congested and no new receipts will be sent to it (unless it's the `allowed_shard` to avoid deadlocks).
+
+## ChunkStateWitness distribution
+
+For chunk production, the chunk producer is required to distribute the chunk state witness to all the chunk validators. The chunk validators then validate the chunk and send the chunk endorsement to the block producer. Chunk state witness distribution is on a latency critical path.
+
+As we saw in the section above, the maximum size of the state witness can be ~16 MiB. If the chunk producer were to send the chunk state witness to all the chunk validators it would add a massive bandwidth requirement for the chunk producer. To ease and distribute the network requirements across all the chunk producers, we have a distribution mechanism similar to what we have for chunks in the shards manager. We divide the chunk state witness into a number of parts, and let the chunk validators distribute the parts among themselves, and later reconstruct the chunk state witness.
+
+### Distribution mechanism
+
+A chunk producer divides the state witness into a set of `N` parts where `N` is the number of chunk validators. The parts or partial witnesses are represented as [PartialEncodedStateWitness](https://github.com/near/nearcore/blob/66d3b134343d9f35f6e0b437ebbdbef3e4aa1de3/core/primitives/src/stateless_validation.rs#L40). Each chunk validator is the owner of one part. The chunk producer uses the [PartialEncodedStateWitnessMessage](https://github.com/near/nearcore/blob/66d3b134343d9f35f6e0b437ebbdbef3e4aa1de3/chain/network/src/state_witness.rs#L11) to send each part to their respective owners. The chunk validator part owners, on receiving the `PartialEncodedStateWitnessMessage`, forward this part to all other chunk validators via the [PartialEncodedStateWitnessForwardMessage](https://github.com/near/nearcore/blob/66d3b134343d9f35f6e0b437ebbdbef3e4aa1de3/chain/network/src/state_witness.rs#L15). Each validator then uses the partial witnesses received to reconstruct the full chunk state witness.
+
+We have a separate [PartialWitnessActor](https://github.com/near/nearcore/blob/66d3b134343d9f35f6e0b437ebbdbef3e4aa1de3/chain/client/src/stateless_validation/partial_witness/partial_witness_actor.rs#L32) actor/module that is responsible for dividing the state witness into parts, distributing the parts, handling both partial encoded state witness message and the forward message, validating and storing the parts, and reconstructing the state witness from the parts and sending is to the chunk validation module.
+
+### Building redundancy using Reed Solomon Erasure encoding
+
+During the distribution mechanism, it's possible that some of the chunk validators are malicious, offline, or have a high network latency. Since chunk witness distribution is on the critical path for block production, we safeguard the distribution mechanism by building in redundancy using the Reed Solomon Erasure encoding.
+
+With Reed Solomon Erasure encoding, we can divde the chunk state witness into `N` total parts with `D` number of data parts. We can reconstruct the whole state witness as long as we have `D` of the `N` parts. The ratio of data parts `r = D/N` is something we can play around with.
+
+While reducing `r`, i.e. reducing the number of data parts required to reconstruct the state witness does allow for a more robust distribution mechanism, it comes with the cost of bloating the overall size of parts we need to distribute. If `S` is the size of the state witness, after reed solomon encoding, the total size `S'` of all parts becomes `S' = S/r` or `S' = S * N / D`.
+
+For the first release of stateless validation, we've kept the ratio as `0.6` representing that ~2/3rd of all chunk validators need to be online for chunk state witness distribution mechanism to work smoothly.
+
+One thing to note here is that the redundancy and upkeep requirement of 2/3rd is the *number* of chunk validators and not the *stake* of chunk validators.
+
+### PartialEncodedStateWitness structure
+
+The partial encoded state witness has the following fields:
+
+* `(epoch_id, shard_id, height_created)` : These are the three fields that together uniquely determine the chunk associated with the partial witness. Since the chunk and chunk header distribution mechanism is independent of the partial witness, we rely on this triplet to uniquely identify which chunk is a part associated with.
+* `part_ord` : The index or id of the part in the array of partial witnesses.
+* `part` : The data associated with the part
+* `encoded_length` : The total length of the state witness. This is required in the reed solomon decoding process to reconstruct the state witness.
+* `signature` : Each part is signed by the chunk producer. This way the validity of the partial witness can be verified by the chunk validators receiving the parts.
+
+The `PartialEncodedStateWitnessTracker` module that is responsible for the storage and decoding of partial witnesses. This module has a LRU cache to store all the partial witnesses with `(epoch_id, shard_id, height_created)` triplet as the key. We reconstruct the state witness as soon as we have `D` of the `N` parts as forward the state witness to the validation module.
+
+### Network tradeoffs
+
+To get a sense of network requirements for validators with an without partial state witness distribution mechanism, we can do some quick back of the envelop calculations. Let `N` but the number of chunk validators, `S` be the size of the chunk state witness, `r` be the ratio of data parts to total parts for Reed Solomon Erasure encoding.
+
+Without the partial state witness distribution, each chunk producer would have to send the state witness to all chunk validators, which would require a bandwidth `B` of `B = N * S`. For the worst case of ~16 validators and ~16 MiB of state witness size, this can be a burst requirement of 2 Gbps.
+
+Partial state witness distribution takes this load off the chunk producer and distributes it evenly among all the chunk validators. However, we get an additional factor of `1/r` of extra data being transferred for redundancy. Each partial witness has a size of `P = S' / N` or `P = S / r / N`. The chunk producer and validators needs a bandwidth `B` of `B = P * N` or `B = S / r` to forward its owned part to all `N` chunk validators. For worst case of ~16 MiB of state witness size and encoding ratio of `0.6`, this works out to be ~214 Mbps, which is much more reasonable.
+
+### Future work
+
+In the Reed Solomon Erasure encoding section we discussed that the chunk state distribution mechanism relies on 2/3rd of the *number* of chunk validators being available/non-malicious and not 2/3rd of the *total stake* of the chunk validators. This can cause a potential issue where it's possible for more than 1/3rd of the chunk validators with small enough stake to be unavailable and cause the chunk production to stall. In the future we would like to address this problem.
+
+## Validator Role Change
+
+Currently, there are two different types of validators and their responsibilities are as follows:
+|  | Top ~50% validators | Remaining validatiors (Chunk only producers) |
+|-----|:-----:|:----:|
+| block production | Y | N |
+| chunk production | Y | Y |
+| block validation | Y | N |
+
+### Protocol upgrade
+
+The good property of the approach taken is that protocol upgrade happens almost seamlessly.
+
+If (main transition, implicit transitions) fully belong to the protocol version before upgrade to stateless validation, chunk validator endorsements are not distributed, chunk validators are not sampled, but the protocol is safe because of all-shards tracking, as we described in "High-level flow".
+
+If at least some transition belongs to the protocol version after upgrade, chunk header height also belongs to epoch after upgrade, so it has chunk validators assigned and requirement of 2/3 endorsements is enabled.
+
+The minor accuracy needed is that generating and saving of state transition proofs have to be saved one epoch in advance, so we won't have to re-apply chunks to generate proofs once stateless validation is enabled. But new epoch protocol version is defined by finalization of **previous previous epoch**, so this is fine.
+
+It also assumes that each epoch has at least two chunks, but if this is not the case, the chain is having a major disruption which never happened before.
+
+## Security Implications
+
+Block validators no longer required to track any shard which means they don't have to validate state transitions proposed by the chunks in the block. Instead they trust chunk endorsements included in the block to certify the validity of the state transitions.
+This makes chunk validator selection algorithm correctness critical for the security of the whole protocol, which is probabilistic by nature unlike the current more strict 2/3 of non-malicious validators requirement.
+
+It is also worth mentioning that large state witness size makes witness distribution slow which could result in a missing chunk because the block producer won't get chunk endorsements in time. This design tries to address that by meticulously limiting max witness size (see [this doc](https://github.com/near/nearcore/blob/master/docs/misc/state_witness_size_limits.md)).
+
+
+## Alternatives
+
+The only real alternative that was considered is the original nightshade proposal. The full overview of the differences can be found in the revised nightshade whitepaper at https://near.org/papers/nightshade. 
+
+## Future possibilities
+
+* Integration with ZK allowing to get rid of large state witness distribution. If we treat state witness as a proof and ZK-ify it, anyone can validate that state witness indeed proves the new chunk header with much lower effort. Complexity of actual proof generation and computation indeed increases, but it can be distributed among chunk producers, and we can have separate concept of finality while allowing generic users to query optimistic chunks.
+* Integration with resharding to further increase the number of shards and the total throughput. 
+* The sharding of non-validating nodes and services. There are a number of services that may benefit from tracking only a subset of shards. Some examples include the RPC, archival and read-RPC nodes. 
+
+## Consequences
+
+### Positive
+
+* The validator nodes will need to track at most one shard. 
+* The state will be held in memory making the chunk application much faster.
+* The disk space hardware requirement will decrease. The top 100 nodes will need to store at most 2 shards at a time and the remaining nodes will not need to store any shards. 
+* Thanks to the above, in the future, it will be possible to reduce the gas costs and by doing so increase the throughput of the system. 
+
+### Neutral
+
+* The current approach to resharding will need to be revised to support generating state witness. 
+* The security assumptions will change. The responsibility will be moved from block producers to chunk validators and the security will become probabilistic. 
+
+### Negative
+
+* The network bandwidth and memory hardware requirements will increase.
+  * The top 100 validators will need to store up to 2 shards in memory and participate in state witness distribution. 
+  * The remaining validators will need to participate in state witness distribution. 
+* Additional limits will be put on the size of transactions, receipts and, more generally, cross shard communication.
+* The dependency on cloud state sync will increase the centralization of the blockchain. This will be resolved separately by the decentralized state sync. 
+
+### Backwards Compatibility
+
+[All NEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. Author must explain a proposes to deal with these incompatibilities. Submissions without a sufficient backwards compatibility treatise may be rejected outright.]
+
+## Unresolved Issues (Optional)
+
+[Explain any issues that warrant further discussion. Considerations
+
+* What parts of the design do you expect to resolve through the NEP process before this gets merged?
+* What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
+* What related issues do you consider out of scope for this NEP that could be addressed in the future independently of the solution that comes out of this NEP?]
+
+## Changelog
+
+[The changelog section provides historical context for how the NEP developed over time. Initial NEP submission should start with version 1.0.0, and all subsequent NEP extensions must follow [Semantic Versioning](https://semver.org/). Every version should have the benefits and concerns raised during the review. The author does not need to fill out this section for the initial draft. Instead, the assigned reviewers (Subject Matter Experts) should create the first version during the first technical review. After the final public call, the author should then finalize the last version of the decision context.]
+
+### 1.0.0 - Initial Version
+
+> Placeholder for the context about when and who approved this NEP version.
+
+#### Benefits
+
+> List of benefits filled by the Subject Matter Experts while reviewing this version:
+
+* Benefit 1
+* Benefit 2
+
+#### Concerns
+
+> Template for Subject Matter Experts review for this version:
+> Status: New | Ongoing | Resolved
+
+|   # | Concern | Resolution | Status |
+| --: | :------ | :--------- | -----: |
+|   1 |         |            |        |
+|   2 |         |            |        |
+
+## Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).