Define Data Availability Sampling (DAS) #110

adlerjohn · 2021-01-18T19:45:47Z

Define the process and parameters (number of samples) for doing Data Availability Sampling (DAS).

musalbas · 2021-01-19T12:21:14Z

I'm not sure the number of samples has to be a part of the spec because clients can decide this for themselves, however the spec can make secure recommendations.

liamsi · 2021-03-16T19:58:32Z

Just for my understanding: will this include how we use ipfs?

adlerjohn · 2021-03-16T21:10:32Z

will this include how we use ipfs?

Eventually, yes. But it will probably start with more conservative goals, e.g. just defining what to sample. Then move on to how to sample as that gets iterated on in the implementation.

adlerjohn · 2021-04-13T14:34:58Z

A recent topic that came up is "do we need to sample from both rows and columns for DAS?" (cf. celestiaorg/celestia-core#270)

Mechanism 1 (from paper)

The mechanism from the paper (celestiaorg/celestia-core#270 (comment)) involves sampling shares randomly from both rows and columns. In other words, each share in the EDS (extended data square) has two possible addresses: the pairs (row, col) (i.e. column index against a row root) and (col, row) (i.e. a row index against a column root).

A fraud proof consists of Merkle proofs against a single row/column root for at least k (the width of the ODS (original data square)) shares.

Mechanism 2 (potential alternative)

We note that having two addresses for shares introduces duplication without any additional security. Instead, we can only sample from rows. The space of shares is identical to above, so each individual share has exactly the same probability of being sampled. The only difference is that only Merkle proofs against row roots are downloaded.

A fraud proof consists of Merkle proofs against a single row root (for a fraudulent row), or a Merkle proof against multiple row roots (for a fraudulent column) for at least k shares.

Discussion

Pros of alternative:

If all nodes by default configuration only sample from rows, there's twice as much bandwidth/capacity available for the same shares (conversely, sampling from both rows and columns halves the bandwidth/capacity available).

Cons of alternative:

Fraud proofs for columns are slightly larger.

I don't immediately see any security issues with the alternative, but maybe I'm missing something.

liamsi · 2021-04-13T14:59:26Z

Fraud proofs for columns are slightly larger.

The question is how much larger in the worst-case and how often do we expect this to happen (and who is impacted by this/who needs to download and process these fraud proofs)?
As far as I understand, it will only be (some particular kind of) full-nodes who will be generating these fraud proofs (and it is expected to be a really, really rare event). What kind of clients will be validating these exactly?

adlerjohn · 2021-04-13T16:21:08Z

The question is how much larger in the worst-case

Without deduplication of sibling nodes, Merkle proofs against a single root vs Merkle proofs against k roots is basically the same size. Nodes are expected to have all the row and column roots, so those don't need to be included in the proof explicitly.

who is impacted by this/who needs to download and process these fraud proofs
What kind of clients will be validating these exactly?

This would affect light nodes mostly in terms of processing fraud proofs.

adlerjohn added documentation Improvements or additions to documentation enhancement New feature or request consensus labels Jan 18, 2021

adlerjohn self-assigned this Jan 18, 2021

adlerjohn mentioned this issue Apr 7, 2021

ValidateAvailability implementation celestiaorg/celestia-core#270

Merged

liamsi unassigned adlerjohn May 31, 2021

liamsi mentioned this issue Sep 19, 2022

Further clarify proofs of invalid erasure coding celestiaorg/celestia-node#1134

Open

4 tasks

adlerjohn mentioned this issue May 12, 2021

Investigate the implications of the client choosing whether to sample from row or column celestiaorg/celestia-node#3198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Data Availability Sampling (DAS) #110

Define Data Availability Sampling (DAS) #110

adlerjohn commented Jan 18, 2021 •

edited by liamsi

Loading

musalbas commented Jan 19, 2021

liamsi commented Mar 16, 2021

adlerjohn commented Mar 16, 2021

adlerjohn commented Apr 13, 2021

liamsi commented Apr 13, 2021 •

edited

Loading

adlerjohn commented Apr 13, 2021

Define Data Availability Sampling (DAS) #110

Define Data Availability Sampling (DAS) #110

Comments

adlerjohn commented Jan 18, 2021 • edited by liamsi Loading

musalbas commented Jan 19, 2021

liamsi commented Mar 16, 2021

adlerjohn commented Mar 16, 2021

adlerjohn commented Apr 13, 2021

Mechanism 1 (from paper)

Mechanism 2 (potential alternative)

Discussion

liamsi commented Apr 13, 2021 • edited Loading

adlerjohn commented Apr 13, 2021

adlerjohn commented Jan 18, 2021 •

edited by liamsi

Loading

liamsi commented Apr 13, 2021 •

edited

Loading