Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide availability guarantees for IBC Packet data #74

Closed
4 tasks
ethanfrey opened this issue Feb 16, 2021 · 8 comments
Closed
4 tasks

Provide availability guarantees for IBC Packet data #74

ethanfrey opened this issue Feb 16, 2021 · 8 comments

Comments

@ethanfrey
Copy link
Contributor

Summary

Although the actual packet data is not required to be accessible in the MultiStore of the sending chain for the application logic to function, it requires extremely high availability guarantees for IBC to be reliable. This information is currently only available via the event system, which has numerous issues and cannot provide such guarantees.

Thus, I propose to add the original (unhashed) packet and acknowledgement data to the MultiStore to ensure a reliable IBC implementation. They should also be cleaned up at the same time that the commitments (hashed values) are.

This is requested to be included for 0.42.0 and I would like a timely review and discussion.

Problem Definition

Much of this came to a surprise to me while working on IBC integration tests, and now the ts-relayer. Until recently, I thought I just didn't fully understand the dozens of gRPC endpoints, but then discovered that on SendPacket, the packet data is never written to the store. But rather, only written to the event system.

The event system has been woefully unmaintained for years and there is now a tracking issue in Tendermint to start working on an overhaul. In particular, there are two main issues I highlighted that directly effect the suitability for IBC:

Before this is dismissed as purely theoretical complaints, I want to say that Band protocol hit problems in integration as they send packets in EndBlock, which is blocked by the first issue. While working on the relayer, I figured sending multiple ReceivePacket messages in one tx is a nice way to include them in one block without worrying about sequence issues and tx ordering. Except that the relayer wouldn't be able to parse the acknowledgement data from the resulting events due to the second issue. I am sure there are more cases (and will be many more in the coming months of IBC development), but I want to highlight that this has become a blocker even before IBC is live on the Cosmos Hub

While there is some motion to upgrade the event system in Tendermint this will not happen overnight, and we do need a reliable solution for sending IBC packets very soon.

Proposal

Store the data:

To ibc/core/04-channel/keeper/Keeper add SetPacketData(ctx, data) and SetAcknowledgementData(ctx, data). Both will store the data in a lookup by the hashed value (commitment), which is already stored in the system. In SendPacket, call SetPacketData() just after SetPacketCommitment. In WriteAcknowledgement, call
SetAcknowledgementData() just after SetPacketAcknowledgement.

Clean up the data:

When deleting the packet commitment: here and here, ensure we also delete the PacketData at the same time. NB: it seems that acknowledgement commitments are currently never deleted, but cleanup of the data should happen when the commitment is removed.

Expose gRPC endpoints:

The current gRPC endpoints allows us to get the commitments for a channel. Add two more endpoints to QueryPacketByCommitment and QueryAcknowledgementByCommitment to look up the data that we wrote above. Once this is working, it becomes usable by the clients. Adding more gRPC endpoints to make the queries simpler can be done in the future in a non-breaking manner.

Backwards compatibility

I write this assuming that it is included in v0.42.0. If not, the issue may become more difficult as people start working on custom work-arounds to enable their chain's use cases.

While the Cosmos Hub will be upgraded to v0.41.0 of the Cosmos-SDK, it is highly unlikely that other mainnets will use this, as there is no support for slashing on double-signing if they don't use the "cosmos" bech32 prefix. cosmos/cosmos-sdk#8461 So, we can assume that all other mainnets will be based on Cosmos SDK v0.42.0 or higher, and the Cosmos Hub will likely upgrade when it adds more advanced functionality.

In order to maintain compatibility, I propose that we continue to emit the same events in v0.42.0, but include these new endpoints in addition. All existing relayers will continue to work without problems and when they wish, they can add an optional flag (per chain) to use the new gRPC endpoints instead of the event system for these packet queries.

An upgrade from v0.41 -> v0.42 would add this new storage and index all packets send after the upgrade. The fact that previous packets are not indexed there is not an issue for the relayer (but must be handled gracefully by the cleanup code). The relayer can continue to run in "legacy" event mode for some days after an upgrade, before switching on the new "use gRPC" flag.

Review requested by @cwgoes @ebuchman @aaronc (and @clevinson for 0.42 release planning)


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ethanfrey ethanfrey changed the title Store packet data in MultiStore Provide availability guarantees for IBC Packet data Feb 16, 2021
@cwgoes
Copy link
Contributor

cwgoes commented Feb 16, 2021

Acknowledgements are optional, so stored packet data may not necessarily be deleted (e.g. in expected successful case).

In part for this reason, in part because it's more apropos design-wise to how events and the store should be used, I have a pretty strong preference towards fixing the events system instead of storing the full packet data in the SDK store, at least long-term. I don't know what the timelines are on upstream Tendermint fixes, but if there's a way that work can be prioritised I'd rather avoid introducing workarounds in the SDK. That said, what you propose is quite reasonable and easy to remove later on.

@ethanfrey
Copy link
Contributor Author

Acknowledgements are optional, so stored packet data may not necessarily be deleted (e.g. in expected successful case).

There are plenty of ways to guarantee cleanup without acknowledgements. It just takes a few proofs that never make it to the modules, but are handled internally in the ics modules. It works great for ordered channels (if I can show packet 123 was received, then I can delete all packets 1-123), but even unordered ones where there is a max timeout it can be quite efficient. Guaranteed cleanup was a section in my 2017 whitepaper (thanks to Jae for many of the ideas) that deserves to be revisited.

I think this would be a good area of focus for ICS 1.1

I don't know what the timelines are on upstream Tendermint fixes, but if there's a way that work can be prioritised I'd rather avoid introducing workarounds in the SDK

I have been pushing for major event system fixes for 2 years. It seems to have finally gotten some traction about 1 week ago. Given almost no one besides Anton has put much thought into the event system or really knows that code, I don't expect this to land quickly. And would be in tendermint v0.35 at the earliest as a breaking change.

I actually don't think storing the data in the sdk is a work-around, rather using events is a workaround for not having a "private store" implemented in the cosmos-sdk. The data is stored on the same disk, so placing it in "events" doesn't really save space.

@colin-axner
Copy link
Contributor

colin-axner commented Feb 16, 2021

My unsolicited 2 uatoms:

The event system can function for IBC (golang relayer + hermes show this), but it is just currently a big pain point and limits greater functionality.

Introducing fully written packets and acknowledgements is easy to do as a short term workaround.

Introducing cleanup functionality is non-trivial imo. While conceptually easy, a bug in these functions is security trivial. We do not want to delete packet commitments that shouldn't be deleted.

I side with @cwgoes on having a strong preference to fixing the event system as opposed to storing full packet data. I fully expect IBC applications to create large packets and given that relayers already need to listen to events to relay, it makes sense to output that information in the event system as well. An event system that doesn't work is still going to cause problems even if we can query the full packet

If we decide to introduce full packet/acknowledgement storage. I'd prefer not to introduce cleanup functions in order to incentivize the removal of it once the event system is brought up to speed

Edit: I guess if the cleanup functions are just to delete the fully written packet (and not the packet commitment), that could be ok

@ethanfrey
Copy link
Contributor Author

Edit: I guess if the cleanup functions are just to delete the fully written packet (and not the packet commitment), that could be ok

You currently delete the packet commitment on acknowledgement or timeout. I only added to delete the full data at the same spot.

@colin-axner
Copy link
Contributor

A packet which is received but whose application does not write an acknowledgement cannot be cleaned up with out more functionality. It just so happens that ics 20 writes acknowledgements for every packet

There's no incentive to clean up commitments from successful acknowledgements. Also, I don't think there exists functionality to clean up acknowledgements

@colin-axner colin-axner transferred this issue from cosmos/cosmos-sdk Mar 5, 2021
faddat referenced this issue in notional-labs/ibc-go Feb 23, 2022
@crodriguezvega
Copy link
Contributor

@ethanfrey Is this issue still relevant now that issue #5963 in Tendermint is completed?

@adizere
Copy link

adizere commented Dec 6, 2022

The indexer in Tendermint is undergoing some reparations again (tendermint/tendermint#9712). The Tendermint team plans to overhaul storage and indexing over the next quarters, eg. see tendermint/tendermint#9437 for great discussion. I suspect this issue can be closed and we can following up in Tendermint issues/PRs.

@ethanfrey
Copy link
Contributor Author

Yeah. You can close this.
But we definitely need a more reliable and scalable event system.
Glad this issue helped nudge that forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants