Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return minimal info on errors returned to contracts #759

Closed
ethanfrey opened this issue Feb 20, 2022 · 14 comments · Fixed by #775
Closed

Return minimal info on errors returned to contracts #759

ethanfrey opened this issue Feb 20, 2022 · 14 comments · Fixed by #775
Milestone

Comments

@ethanfrey
Copy link
Member

When discussing what information that can be returned in Interchain Accounts, we realized that error messages and events are not guaranteed to be the same on every system (they are not part of consensus).

We can safely return data and error code. They have changed events in the sdk in minor upgrades, but for the same version, it will be the same on all systems (regardless of build system).

I would start limiting what we return to the contract and for the error returned in the reply section, not include error strings, rather the error code. This will make it harder to debug, but it will remove an attack vector that has popped up unexpectedly time after time. (Not in attacks, but in issues with different build systems)

@webmaster128 what do you think about this? I know this was also a concern of yours for windows and arm64 builds.

@alpe I wonder if there is a subset of info we could safely whitelist, like messages generated from x/wasm itself that don't include strings that come from other parts of the sdk.

@ethanfrey
Copy link
Member Author

This would not change the contract interface, just the actual error strings returned by wasmd (which are not guaranteed in any case). It would be a breaking change for wasmd (no migration needed, but x/upgrade for sure) but no effects on CosmWasm.

@alpe
Copy link
Contributor

alpe commented Feb 21, 2022

The error message is helpful for debugging issues but should not be used to build logic on. We could follow the ABCI model and reduce any error to their ABCI code id, indeed. That should be straight forward. Unknown errors would be redacted to "internal". With tracing flag set, we can still print the full error to the console for debugging.
If we want to be more verbose then the root error message could be used but it is likely not super helpful without more context.

@ethanfrey
Copy link
Member Author

When contract A sends a message to contract B and it fails, we usually get an error message that is quite helpful for debugging (e.g message missing some field, etc). If we just replace that with ExecutionFailed, this would make life for developers much harder.

We should think about this and pick one of these:

A. Don't care, just lock everything down. Every error that gets sent to a contract in reply is redacted.
B. Allow some "debug" flag to show these messages. This would be set on a node and meant for testnets. Kind of like the --trace message. Not sure if this would really help.
C. Go through all error message sources, try to figure out which ones are surely deterministic and don't filter those (whitelist them) and redact the rest.

Note that in any case, we don't redact the error messages returned over the tendermint abci interface, this is just about the errors sent into the contracts in the reply block. Also, if we don't catch the error case, these messages would probably propagate the the users anyway without passing through reply.

I prefer (A) for its simplicity.

@alpe
Copy link
Contributor

alpe commented Feb 22, 2022

I am in favour of B) as it could be very helpful for local development.
We can use the --trace flag as well and just log the raw error on info before we redact it for the contract.

@ethanfrey
Copy link
Member Author

I like the idea of logging this info with the --trace flag.

Same behavior for the contract (redact error), but we output to the log (maybe warn level?) Since this is only for local development, they should have access to the logs.

Now, if only tendermint wasn't so verbose, and you could actually find anything in those logs

@webmaster128
Copy link
Member

I wonder which call or code route we are talking about here. There are at least the two cases: 1. synconous queries where we have two levels of errors (pub type QuerierResult = SystemResult<ContractResult<Binary>>) and 2. submessage replies where we get back

pub struct Reply {
    pub id: u64,
    pub result: ContractResult<SubMsgExecutionResponse>, // what errors end up here? Just erros from the contract as the type indicates? Or host errors?
}

pub struct SubMsgExecutionResponse {
    pub events: Vec<Event>, // potential problem here as mentioned above
    pub data: Option<Binary>,
}

Are there other cases in IBC?

The problem I see with removing the error messages emitted by other contracts is we have no error code to replace those.

@ethanfrey
Copy link
Member Author

Good pont about the Querier interface, I forgot that one.

The problem I see with removing the error messages emitted by other contracts is we have no error code to replace those.

They all have some sdk error code, maybe it is just ExecuteFailed from x/wasm, but that is some number.

@ethanfrey
Copy link
Member Author

pub struct SubMsgExecutionResponse {
    pub events: Vec<Event>, // potential problem here as mentioned above
    pub data: Option<Binary>,
}

Yes, events are a potential issue. They have bumped those in patch releases (I think 0.42.6 -> 0.42.7 for example). But they are guaranteed to be the same on all build systems for the same version of the sdk, where error messages have been proven to be different on different OS/libc versions. I would investigate those more later. And many cases they are essential for contract functioning, while errors are only for developer debugging and not contract execution.

@ethanfrey
Copy link
Member Author

IBC doesn't pass in errors.
Actually, the error was passed on a failed ics20, but they are removing this too for the same reasoning, using some standard error.

The lemma "never rely not the error string" really coming home

@ethanfrey ethanfrey added this to the v0.25.0 milestone Feb 22, 2022
@webmaster128
Copy link
Member

Let's break this down into the very specific cases before starting to implement it.

E.g. while events are potentially dangerous, people already use them e.g. to get the contract address of an instantiation in a submessage and it would not be very nice to break all thise code:

Bildschirmfoto 2022-02-23 um 15 36 47

@ethanfrey
Copy link
Member Author

Let's break this down into the very specific cases before starting to implement it.

E.g. while events are potentially dangerous, people already use them e.g. to get the contract address of an instantiation in a submessage and it would not be very nice to break all thise code:

I agree and would leave events as is. Just focus on error string, which provide small value and large problem error.

Steps:

  1. Redact all error messages in reply block by replacing them with the sdk code
  2. Redact errors when contracts make queries, replacing them with a set of fixed error string (we can use the enum type ideally)

1 is pretty straight forward and minimal impact on dev experience.
2 is a easy if we don't care about dev ex. If we try to preserve some info, this gets more tricky.
I would tackle 1 in a PR and we can discuss 2 more

@apollo-sturdy
Copy link
Contributor

@ethanfrey Can we get some more insight on what kind of non-determinism this is trying to avoid? I have been spending the past several days trying to track down a smart contract bug, and it would have taken only minutes if the contract error from a SmartQuery were forwarded all the way to the transaction rather than being redacted to codespace: wasm, code: 9. If at all possible, I think fixing the propagation of simple SmartQuery errors would be the single most high impact fix for CosmWasm developer quality of life. I only did brief reading of the link in the first post here, but it seems that the non-determinism has to do with Interchain Accounts, so could regular smart contract query errors be passed on without issues? If not, could we at least make it so that the address of the queried contract is included or even the QueryMsg? Even this would speed up debugging by orders of magnitude.

@webmaster128
Copy link
Member

Could you please open a new issue for that. See also https://twitter.com/simon_warta/status/1615679335008030720.
By default we do not get any determinism guarantees for SDK error strings, no matter where the issue popped up. We have to work on those guarantees for special cases.

@apollo-sturdy
Copy link
Contributor

Could you please open a new issue for that. See also https://twitter.com/simon_warta/status/1615679335008030720. By default we do not get any determinism guarantees for SDK error strings, no matter where the issue popped up. We have to work on those guarantees for special cases.

Thanks for the quick reply. Opened a new issue here: #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants