Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Define Data Availability Sampling (DAS) #110

Open
adlerjohn opened this issue Jan 18, 2021 · 6 comments
Open

Define Data Availability Sampling (DAS) #110

adlerjohn opened this issue Jan 18, 2021 · 6 comments
Labels
consensus documentation Improvements or additions to documentation enhancement New feature or request

Comments

@adlerjohn
Copy link
Member

adlerjohn commented Jan 18, 2021

Define the process and parameters (number of samples) for doing Data Availability Sampling (DAS).

@adlerjohn adlerjohn added documentation Improvements or additions to documentation enhancement New feature or request consensus labels Jan 18, 2021
@adlerjohn adlerjohn self-assigned this Jan 18, 2021
@musalbas
Copy link
Member

I'm not sure the number of samples has to be a part of the spec because clients can decide this for themselves, however the spec can make secure recommendations.

@liamsi
Copy link
Member

liamsi commented Mar 16, 2021

Just for my understanding: will this include how we use ipfs?

@adlerjohn
Copy link
Member Author

will this include how we use ipfs?

Eventually, yes. But it will probably start with more conservative goals, e.g. just defining what to sample. Then move on to how to sample as that gets iterated on in the implementation.

@adlerjohn
Copy link
Member Author

A recent topic that came up is "do we need to sample from both rows and columns for DAS?" (cf. celestiaorg/celestia-core#270)

Mechanism 1 (from paper)

The mechanism from the paper (celestiaorg/celestia-core#270 (comment)) involves sampling shares randomly from both rows and columns. In other words, each share in the EDS (extended data square) has two possible addresses: the pairs (row, col) (i.e. column index against a row root) and (col, row) (i.e. a row index against a column root).

A fraud proof consists of Merkle proofs against a single row/column root for at least k (the width of the ODS (original data square)) shares.

Mechanism 2 (potential alternative)

We note that having two addresses for shares introduces duplication without any additional security. Instead, we can only sample from rows. The space of shares is identical to above, so each individual share has exactly the same probability of being sampled. The only difference is that only Merkle proofs against row roots are downloaded.

A fraud proof consists of Merkle proofs against a single row root (for a fraudulent row), or a Merkle proof against multiple row roots (for a fraudulent column) for at least k shares.

Discussion

Pros of alternative:

  • If all nodes by default configuration only sample from rows, there's twice as much bandwidth/capacity available for the same shares (conversely, sampling from both rows and columns halves the bandwidth/capacity available).

Cons of alternative:

  • Fraud proofs for columns are slightly larger.

I don't immediately see any security issues with the alternative, but maybe I'm missing something.

@liamsi
Copy link
Member

liamsi commented Apr 13, 2021

Fraud proofs for columns are slightly larger.

The question is how much larger in the worst-case and how often do we expect this to happen (and who is impacted by this/who needs to download and process these fraud proofs)?
As far as I understand, it will only be (some particular kind of) full-nodes who will be generating these fraud proofs (and it is expected to be a really, really rare event). What kind of clients will be validating these exactly?

@adlerjohn
Copy link
Member Author

The question is how much larger in the worst-case

Without deduplication of sibling nodes, Merkle proofs against a single root vs Merkle proofs against k roots is basically the same size. Nodes are expected to have all the row and column roots, so those don't need to be included in the proof explicitly.

who is impacted by this/who needs to download and process these fraud proofs
What kind of clients will be validating these exactly?

This would affect light nodes mostly in terms of processing fraud proofs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
consensus documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants