Investigate canonical encoding scheme #365

sync-by-unito · 2021-05-27T16:26:40Z

Currently, a canonical variant of protobuf3 is used to define data structures and do serialization/deserialization. This is great for portability, isn't ideal for blockchain applications. For example, there are no fixed-size byte arrays (32-bytes hashes used all over the place in blockchains), or optional fields. Robust protobuf implementations for embedded devices (e.g. hardware wallets) or blockchain smart contracts (e.g. Solidity) are non-existent and would be prohibitive to develop.

The crux is that protobuf was designed for client-server communication, with different versions (i.e. both forwards and backwards compatibility are key features). This is unneeded for core blockchain data structures (e.g. blocks, votes, or transactions), but may be good for node-to-node communication (e.g. messages that wrap around block data, vote data, or transaction data).

We should investigate the feasibility of using a simpler serialization scheme for core data structures.

Desiderata:

fully deterministic (specifically, bijective)
binary, not text
native support for basic blockchain data types (esp. fixed-sized arrays)
typedefs / type aliases (i.e. zero-cost abstractions)
no requirements on backwards or forwards compatibility

Comparison of Difference Schemes

Protobuf

https://developers.google.com/protocol-buffers/docs/overview

https://github.com/lazyledger/protobuf3-solidity-lib

https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-027-deterministic-protobuf-serialization.md

cosmos/cosmos-sdk#7488

protocolbuffers/protobuf#3521

XDR

https://tools.ietf.org/html/rfc4506

https://developers.stellar.org/docs/glossary/xdr

Veriform

https://github.com/iqlusioninc/veriform

SimpleSerialize (SSZ)

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md

┆Issue is synchronized with this Asana task by Unito

sync-by-unito · 2021-05-27T16:26:44Z

➤ Ismail Khoffi commented:

> One such scheme is Veriform, which can compute a canonical commitment to some data

My understanding is that the hashing can easily be made optional there. Other than that it (the encoding part of veriform) shares a lot with the vanilla protobuf encoding. The only changes I can see are: restricting it to fewer types (and specifying string encoding) to achieve determinism. Additionally, it squeezes in a critical bit ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#critical-bit ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#critical-bit ) ) to indicate that a field is not allowed to be missing and uses a different varint ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#little-endian-prefixed-variable-width-integers ( https://github.com/iqlusioninc/veriform/blob/develop/spec/draft-veriform-spec.md#little-endian-prefixed-variable-width-integers ) ) encoding. It actually tries to enable backwards (and forward?) compatibility or "schema evolution". The main motivation ( iqlusioninc/usbarmory.rs#13 ( iqlusioninc/usbarmory.rs#13 ) ) for not using protobuf and developing veriform instead seems to be the lack of a (rust) implementation that can be used in a heavily restricted environment.

I feel like I should try to ignore all concerns I share with adlerjohn and try writing a defence for just using protobuf with additional rules ( https://github.com/regen-network/canonical-proto3 ( https://github.com/regen-network/canonical-proto3 ) ) some time soon.

I think protobuf with additional rules (almost) ticks all boxes of:

Desiderata:

fully deterministic

binary, not text

native support for basic blockchain data types (esp. fixed-sized arrays)

typedefs / type aliases

no requirements no backwards or forwards compatibility

BTW, while typedefs are being really useful in the spec, I don't see why they should be a hard requirement for the serialzation format. Regarding the fixed sized arrays, I feel like checking the length could be done by the core-types and not by the generated proto types.

liamsi closed this as completed May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate canonical encoding scheme #365

Investigate canonical encoding scheme #365

sync-by-unito bot commented May 27, 2021

sync-by-unito bot commented May 27, 2021

Investigate canonical encoding scheme #365

Investigate canonical encoding scheme #365

Comments

sync-by-unito bot commented May 27, 2021

Comparison of Difference Schemes

Protobuf

XDR

Veriform

SimpleSerialize (SSZ)

sync-by-unito bot commented May 27, 2021