Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History Network: How to deal with the Ephemeral-ness of header canonicalness proofs #239

Open
pipermerriam opened this issue Nov 7, 2023 · 4 comments

Comments

@pipermerriam
Copy link
Member

What is the problem

In the history network, we make use of the double batched log accumulator to produce proofs which anchor the canonicalness of headers. The construction of double batched log acccumulators is such that the overall proof is comprised of two proofs:

  • the proof within the epoch accumulator itself
  • the proof anchoring the epoch accummulator proof root to the master accumulator

The problem arises in the case of headers that are anchored within an epoch accumulator that is not yet full. In this case, each newly added header will change the epoch accumulator proof. Once the epoch accumulator is full, this proof no longer changes and can be considered static (ignoring re-orgs around epoch boundaries).

In addition to the difficulty of these proofs being unstable while the epoch accumulator is filling up, we also encounter a secondary problem once the epoch has been completed. During the time when an epoch accumulator is still being filled, a proof made at block N can still be reconstructed from the epoch accumulator at block N+1, N+2 and so on, as long as we are still within the boundary of the epoch. The epoch accumulator can be effectively "rolled back", by discarding the new blocks after block N in order to verify a proof constructed at block N. The problem arises at epoch boundaries, where the epoch accumulator is hashed, discarded, and added to the master accumulator. At this stage, the contents of the epoch accumulator cannot be reliably known without fully reconstructing it from all of the blocks during the course of that epoch, a strategy that is not viable for Portal.

Within the history network we need a strategy to handle the mutability of these proofs.

  • We need to inject headers into the network as soon as they are available as it is not viable to wait until these proofs are stable.
  • We need to anchor newly added headers as canonical to protect the network from DOS spam.
  • We need the canonicalness proofs of headers in the network to be stable over long periods of time.
@pipermerriam
Copy link
Member Author

One approach to solving this:

  • Introduce a new content type that I'll call EphemeralHeaderWithProof
  • This new content type simply uses a canonicallness proof constructed at block N (the same height as the header) to prove it is canonical.
  • At each epoch boundary, all headers from the recent header are re-gossipped with a newly constructed proof from the now "full" and stable epoch accumulator.

We construct these such that the content key of the two different header types are different, but that they map to the same content-id. This way, nodes that are storing an EphemeralHeaderWithProof for a block at height N, can garbage collect this content upon receiving a new payload with a stable proof.

@kdeme
Copy link
Collaborator

kdeme commented Apr 22, 2024

The context around issue has changed a bit but it is still very much applicable for the BeaconState historical_summaries accumulator + recently created headers.

We construct these such that the content key of the two different header types are different, but that they map to the same content-id. This way, nodes that are storing an EphemeralHeaderWithProof for a block at height N, can garbage collect this content upon receiving a new payload with a stable proof.

How did you see this work exactly? I think having different content keys and same content ids only has an effect here if a content database is used where also the content key is stored as some meta-data?

If not, it would not really help, and an implementation probably would have to set some additional field or have some other table keep track of these headers without a proof.

@pipermerriam
Copy link
Member Author

pipermerriam commented Apr 26, 2024

I think that we need to make some clear rules about how content should be looked up in our respective client content databases... Up until now I think it has been acceptable for a client to treat them as a dumb key/value store and to look things up by either content-key or content-id as primary key and simply return the data...

I think we need to move away from this, to open up the option for us to have more complex data types in our network. The use case I'm familiar with is for block bodies and individual transaction retrieval.

  • block body content key is: /<identifier>/<block-hash>
  • individual transaction would be /<identifier>/<block-hash>/<transaction-index>

Both of these would map to the same content-id (that of the block body) so that the agent on the network that is storing the body can just return the single transaction rather than the full body payload.

I'll pull this discussion out to it's own issue here soon....

@kdeme
Copy link
Collaborator

kdeme commented May 22, 2024

Both of these would map to the same content-id (that of the block body) so that the agent on the network that is storing the body can just return the single transaction rather than the full body payload.

This would require the client to have a custom db.get() per content key selector. This is already required for the portal-beacon network, and could perfectly be done for portal-history. It doesn't require a change to the usage of the dumb key/value store (but you could of course).
But it does require an additional content type to be added (the transaction one). It does get a bit tricky as the individual transaction type should never get gossiped.

For me, this does fall under the category of "later optimizations" and it doesn't really solve the issue of "how to track the headers without proof for later purging".

But I do agree that we should have a discussion on how much we might or might not want to define in the specification regarding the content db in order to solve things like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants