Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate canonical encoding scheme #365

Closed
sync-by-unito bot opened this issue May 27, 2021 · 1 comment
Closed

Investigate canonical encoding scheme #365

sync-by-unito bot opened this issue May 27, 2021 · 1 comment

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented May 27, 2021

Currently, a canonical variant of protobuf3 is used to define data structures and do serialization/deserialization. This is great for portability, isn't ideal for blockchain applications. For example, there are no fixed-size byte arrays (32-bytes hashes used all over the place in blockchains), or optional fields. Robust protobuf implementations for embedded devices (e.g. hardware wallets) or blockchain smart contracts (e.g. Solidity) are non-existent and would be prohibitive to develop.

The crux is that protobuf was designed for client-server communication, with different versions (i.e. both forwards and backwards compatibility are key features). This is unneeded for core blockchain data structures (e.g. blocks, votes, or transactions), but may be good for node-to-node communication (e.g. messages that wrap around block data, vote data, or transaction data).

We should investigate the feasibility of using a simpler serialization scheme for core data structures.

Desiderata:

  • fully deterministic (specifically, bijective)
  • binary, not text
  • native support for basic blockchain data types (esp. fixed-sized arrays)
  • typedefs / type aliases (i.e. zero-cost abstractions)
  • no requirements on backwards or forwards compatibility

Comparison of Difference Schemes

Protobuf

https://developers.google.com/protocol-buffers/docs/overview

https://github.com/lazyledger/protobuf3-solidity-lib

https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-027-deterministic-protobuf-serialization.md

cosmos/cosmos-sdk#7488

protocolbuffers/protobuf#3521

XDR

https://tools.ietf.org/html/rfc4506

https://developers.stellar.org/docs/glossary/xdr

Veriform

https://github.com/iqlusioninc/veriform

SimpleSerialize (SSZ)

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md

┆Issue is synchronized with this Asana task by Unito

@sync-by-unito
Copy link
Author

sync-by-unito bot commented May 27, 2021

➤ Ismail Khoffi commented:

> One such scheme is Veriform, which can compute a canonical commitment to some data

My understanding is that the hashing can easily be made optional there. Other than that it (the encoding part of veriform) shares a lot with the vanilla protobuf encoding. The only changes I can see are: restricting it to fewer types (and specifying string encoding) to achieve determinism. Additionally, it squeezes in a critical bit ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#critical-bit ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#critical-bit ) ) to indicate that a field is not allowed to be missing and uses a different varint ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#little-endian-prefixed-variable-width-integers ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#little-endian-prefixed-variable-width-integers ) ) encoding. It actually tries to enable backwards (and forward?) compatibility or "schema evolution". The main motivation ( iqlusioninc/usbarmory.rs#13 ( iqlusioninc/usbarmory.rs#13 ) ) for not using protobuf and developing veriform instead seems to be the lack of a (rust) implementation that can be used in a heavily restricted environment.

I feel like I should try to ignore all concerns I share with adlerjohn and try writing a defence for just using protobuf with additional rules ( https://github.com/regen-network/canonical-proto3 ( https://github.com/regen-network/canonical-proto3 ) ) some time soon.

I think protobuf with additional rules (almost) ticks all boxes of:

Desiderata:

  • fully deterministic
  • binary, not text
  • native support for basic blockchain data types (esp. fixed-sized arrays)
  • typedefs / type aliases
  • no requirements no backwards or forwards compatibility

BTW, while typedefs are being really useful in the spec, I don't see why they should be a hard requirement for the serialzation format. Regarding the fixed sized arrays, I feel like checking the length could be done by the core-types and not by the generated proto types.

@liamsi liamsi closed this as completed May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant