Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Parallel IBD #52

Closed
wants to merge 3 commits into from
Closed

[WIP] Parallel IBD #52

wants to merge 3 commits into from

Conversation

jaspervdm
Copy link
Contributor

@jaspervdm jaspervdm commented Apr 21, 2020

Still WIP. Working on implementation side of p2p messages simultaneously, details might change.

Link to rendered text

@jaspervdm jaspervdm changed the title Parallel IBD [WIP] Parallel IBD Apr 21, 2020
Copy link
Contributor

@lehnberg lehnberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @jaspervdm! Took a very high level pass at it with some questions.


A node joining the Grin network does not need the complete block history in order to fully verify the chain state. The block headers, the unspent output set and the complete kernel set are sufficient. The output and kernel sets are stored as leaves in a Merkle Mountain Range (MMR) and the block headers commit to the root of these trees. Prior to HF2, the block headers only committed to the output commitments and not their unspent/spent status. This meant that output and kernel data could only be verified after it had been completely downloaded, which forced nodes to download the full data from a single peer. The downsides of this are apparent: the download speed is bottlenecked by the bandwidth of the other peer and if that peer goes offline during download or provides malicious data (which can only be verified after the fact) the process had to be restarted from scratch from another peer.

However, due to a consensus change in HF2 the headers now also commit to the unspent/spent status. The output spent status is stored in a bitmap, which is split up in chunks of 1024 and stored in separate MMR. Block headers commit to the root of this MMR. This means that these chunks can be downloaded and verified independently, by providing two merkle proofs along the data that prove the inclusion of the unspent outputs and the output bitmap in the roots. It allows the data to be downloaded in parallel and verified as it comes in, which greatly improves the bootstrap time of a fresh node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to make this less technical, more accessible to someone who has no idea what "commit", "bitmap", "MMR", roots", etc is.

Before we were downloading a huge zip file from one peer. Now, we can download multiple data streams from multiple peers in parallel. Something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(github falling over, apologies if I'm commenting multiple times...)

Agreed. This section can probably be moved as-is into the reference explanation.

Comment on lines +20 to +21
# Community-level explanation
[community-level-explanation]: #community-level-explanation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I feel is missing from this section is a small paragraph of what exactly is being outlined in this RFC, i.e. what are we describing here, what is the change that will be triggered, and how will it work?

Here `TOTAL_SUPPLY` is the total supply of Grins at the point of sync, which is equal to `(HEIGHT + 1)*COINBASE_REWARD`.

If all of this is successful, the node now has downloaded and verified the state up until the sync horizon point. Fully updating the chain state to the latest head involves downloading the full blocks after the sync horizon, verifying them and applying its contents consecutively, which is possible because all nodes are expected to store the full blocks past the compaction horizon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the checks fail? Process is restarted? Different peers? What's the peer selection process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of implementation details that I have intentionally left open, because I haven't fully decided on the scope of the RFC yet. Up to now I have limited it to mostly the p2p messages and how they should be verified, which is something we can comfortably get into the next hard fork release. But I could extend it to fully describe the new sync process, including how intermediary state should be tracked, how we can deal with malicious peers and what the plan would be around running the parallel sync side by side with the old sync at first, before we completely remove the old sync.

@antiochp
Copy link
Member

We should think through how rangeproofs will be handled here.

We maintain separate MMRs for outputs and rangeproofs.
The output MMR actually stores OutputIdentifier entries (which are outputs without their rangeproofs).
I think we should consider requesting chunks of rangeproofs separately to chunks of outputs.

So three different requests -

  • chunked outputs
  • chunked rangeproofs
  • chunked kernels

Each is used to incrementally build a full MMR.
These can all be requested in parallel.
Each MMR root can be verified separately.

Additional UTXO verification to ensure the output MMR and rangeproof MMR can be used to reconstruct the full valid UTXO set.

For every unspent output in the utxo bitmap there must exist -

  • output in the output MMR
  • corresponding rangeproof in the rangeproof MMR

Then rangeproofs are batch verified, given the associated output commitments.

Chunk size of 1024 may potentially be too large for rangeproofs?
So maybe these should be requested with a smaller chunk size?

@antiochp
Copy link
Member

Also an additional edge case that we need to consider.

A chunk with a full 1024 leaves may be entirely pruned. In this situation the root itself will also be pruned.

We would need to use a parent higher in the subtree beneath the peak (potentially ending up at the peak itself).
Its not clear to me if this is sufficient to prove existence of subtree in this larger subtree?
But there must be some way of proving a full 1024 leaf subtree was in there but fully pruned.

Maybe in this case we cannot prove anything about the 1024 subtree and we need to aggregate with the next one somehow?
Two adjacent 1024 leaf subtrees can be aggregated into a single 2048 leaf subtree with height+1, so maybe chunks can be variable sized based on how pruned they are?

Eventually we aggregate up to a subtree with either -

  • at least a single unpruned leaf, or
  • a fully pruned subtree resulting in a single MMR peak

This is all just thinking out loud, but this is kind of nasty edge case.

@jaspervdm
Copy link
Contributor Author

We should think through how rangeproofs will be handled here. [..]
Chunk size of 1024 may potentially be too large for rangeproofs?
So maybe these should be requested with a smaller chunk size?

Yes this is something I've been thinking about as well. Since we're keeping a MMR for them at the very least we need to add a merkle proof for them as well. Adding a separate p2p message might make sense, but this does complicate the verification process of these messages a bit, since we can only verify them after receiving the output commitments and bitmap from the other messages. Unless we make sure to only request the rangeproof chunks after we got the corresponding output chunk.

A chunk with a full 1024 leaves may be entirely pruned. In this situation the root itself will also be pruned.
We would need to use a parent higher in the subtree beneath the peak (potentially ending up at the peak itself).
Its not clear to me if this is sufficient to prove existence of subtree in this larger subtree?

Good catch, hadn't thought about this edge case yet. I have to consider it a bit longer but if we have multiple fully pruned chunks next to each other we could have pruned the hashes until an arbitrary height above 10 (the height of the root of the full chunk subtree). A "proof" for this chunk would give the hashes from the first unpruned nodes up until the peak of the tree. A node could lie about the number of levels that are pruned and give a false proof which would prevent us from reconstructing the PMMR with the necessary intermediary hashes. But this is something we can only verify after receiving all the relevant chunks. An easy way out would be to not prune above height 10 but that feels like a hack (and maybe already impossible if we've already fully pruned chunks, I didn't check). The proper way of treating this is to probably have an additional verification step at the end that walks up from the fully pruned chunks and compares the proofs, filling in any MMR entries that are missing.

@antiochp
Copy link
Member

antiochp commented Apr 27, 2020

Related to IBD is the case where a node has been offline for > 7 days and needs to sync to catch up.
Might be worth considering this, at least at a high level overview, in the RFC?

We currently do this is a very sub-optimal way by just downloading a new full txhashset.

With this PIBD work we can be significantly smarter than this -

  • download missing recent kernels (we already have most of them)
  • download the new utxo set
    • recent chunks of MMR
    • maybe the full utxo bitmap? (so we know what previous outputs have since been spent?)

The node would not need to re-verify existing kernels and would not need to re-verify any pre-existing unspent utxo rangeproofs.

Catching up with a couple of weeks of missing data like this will be relatively compact and likely pretty fast.

@lehnberg lehnberg added the node dev Related to node dev team label Apr 27, 2020
@jaspervdm
Copy link
Contributor Author

I had only briefly considered the case where the node has synced in the past but is offline for longer than the horizon and assumed a bit naively that they could simply donwload the missing chunks. But you are right, they need the updated bitmap as well. This actually makes me reconsider the design of the output chunk message, should we perhaps split off the bitmap to a seperate one? It is needed in all sync situations, and we would save the effort/bw of including a merkle proof for them in every chunk message.

@phyro
Copy link
Member

phyro commented May 1, 2020

This could be a nonissue, but I have a question about how this mixes with the #47 proposal. Since NRD kernels are a kernel type that is not completely isolated, they need to be validated relative to some moving window of max relative lock distance. If I'm understanding this correctly it means that a kernel chunk that contains a NRD kernel can't be fully validated on its own. Could this be an issue or is it easily solvable?

@lehnberg lehnberg assigned jaspervdm and j01tz and unassigned jaspervdm May 5, 2020
@tromp
Copy link
Contributor

tromp commented Jul 6, 2020

We should make sure that the IBD size remains linear in UTXO set size rather than in TXO size.
That will require switching the representation (segments of) of the spent bitmap from a bitmap to a list of unspent indices when the bitmap becomes sufficiently sparse.

@lehnberg
Copy link
Contributor

Closed in favour of smaller RFC: #68

@lehnberg lehnberg closed this Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
node dev Related to node dev team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants