Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node watcher app #4689

Closed
4 tasks
b00f opened this issue Jul 5, 2019 · 7 comments
Closed
4 tasks

Node watcher app #4689

b00f opened this issue Jul 5, 2019 · 7 comments

Comments

@b00f
Copy link

b00f commented Jul 5, 2019

Summary

Node watcher app for updating a node

Problem Definition

Currently cosmos-sdk is under heavy changes. This might cause application to face soft-fork. Updating a node is not an easy task for blockchains. Classic approach is keeping the fork history inside the code. Something like this:

if height < FORK_N {
   do_that();
} else {
   do_this();
}

That will make the code more messy and probably buggy.

Proposal

I am going to share an idea and open a discussion about it.
An node-watcher app can be developed. This node-watcher is responsible to run the blockchain. node-watcher can read a repo (ex. github page) to check the upcoming changes. It can simply hold this information:

height  node_version
0       version_1.0
1000    version_2.0

node-watcher downloads and run the node due to this information. When a soft-fork is going to happen we can update this page to notify all node-watcher to be ready to restart the node automatically.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

alexanderbez commented Jul 5, 2019

Hey @b00f, it is true that the SDK contains consistent changes and will continue to do so for quite some time, but this sort of solution isn't needed for "soft-forks" upgrades afaict.

All point releases are non-breaking and can be upgraded to safely at any time without needing to have height conditionals or a sidecar process under most circumstances.

That being said, the idea of a sidecar process was initially introduced by @ebuchman with regards to live breaking major upgrades (hard-forks). There are already a few threads in which this topic is discussed.

There has been discussion with the Regen team on how to integrate our migration work with their upgrade module. This work coupled with the halt-height config and a script would allow us to perform such upgrades.

ref: #4471
ref: #4409
ref: #4233

I suggest discussion continue on those threads instead of this one.

@b00f
Copy link
Author

b00f commented Jul 8, 2019

Upgrading proposal is a good idea for hard forks and halting the node makes sense. (However still I don't understand why genesis should be updated for a live chain? #4409)
what I suggested here is an external application which can run the node, download the new version and re-run the node. Since this app is the owner of the node, it can close it and run it again. This app can upgrade the node for simple updates/bug fixing or soft forks.

@ethanfrey
Copy link
Contributor

ethanfrey commented Jul 8, 2019

#4233 is an interesting approach that should match your needs. This is very similar to your idea, but runs inside the node code itself, and allows upgrade hooks to auto-migrate. Can you review this and maybe add a feature request or two. I just started working on this PR to help finish it up. It would be great if it covered the needs of all projects.

And yes, I agree. Stopping a blockchain, and starting from a new genesis doesn't seem like the most elegant solution unless absolutely necessary in an emergency.

From x/upgrade/doc.go:


General Workflow

Let's assume we are running v0.34.0 of our software in our testnet and want to upgrade to v0.36.0.
How would this look in practice? First of all, we want to finalize the v0.36.0 release candidate
and there install a specially named upgrade handler (eg. "testnet-v2" or even "v0.36.0"). Once
this code is public, we can have a governance vote to approve this upgrade at some future blocktime
or blockheight (known as an upgrade.Plan). The v0.34.0 code will not know of this handler, but will
continue to run until block 200000, when the plan kicks in at BeginBlock. It will check for existence
of the handler, and finding it missing, know that it is running the obsolete software, and kill itself.
Shortly before killing itself, it will check if there is a script in <home_dir>/config/do-upgrade
and run it if present.

Generally the gaiad/regend/etc binary will restart on crash, but then will execute this BeginBlocker
again and crash, causing a restart loop. Either the operator can manually install the new software,
or you can make use of the do-upgrade script to eg. dump state to json (backup), download new binary
(from a location I trust - script writen by operator), install binary.

When the binary restarts with the upgraded version (here v0.36.0), it will detect we have registered the
"testnet-v2" upgrade handler in the code, adn realize it is the new version. It then will run the script
and migrate the database in-place. Once finished, it marks the upgrade as done, and continues processing the rest of the block as normal. Once 2/3 of the voting power has upgraded, the blockchain will immediately resume the consensus mechanism. If the majority of operators add a custom do-upgrade script, this should be a matter of minutes and not even require them to be awake at that time.

@b00f
Copy link
Author

b00f commented Jul 10, 2019

This is very similar to your idea, but runs inside the node code itself, and allows upgrade hooks to auto-migrate

There are two ways for updating/upgrading an application: an application can updates himself, or a watcher updates the app. Both ways have good and bad points. For a blockchain application I believe the second way is more elegant. So far I haven't seen any blockchain adopted this idea.
If users using docker to run the node, how do-upgrade script will work? Inside docker?

BTW, Thanks for your full explanation.

@b00f
Copy link
Author

b00f commented Jul 10, 2019

Closing in favor of #4233

@b00f b00f closed this as completed Jul 10, 2019
@ethanfrey
Copy link
Contributor

Thank you, @b00f, and I think you have very good ideas on what is needed to make the blockchain more usable. I have been working on some of these solutions as well, and happy to have more input on them.

The general idea is you have some supervisor (systemd?) that restarts the app when it crashes. It runs from a known location. Maybe it runs docker run cosmos/gaia:deploy ... where you have a local deploy tag that points to v0.34.0, v0.36.0 or whatever.

When we plan the upgrade, the admin can before hand download the v0.36.0 docker image, but it is not yet running. Then add a do-upgrade script like:

docker tag cosmos/gaia:v0.36.0 cosmos/gaia:deploy

The current (v0.34.0) app will panic and die. When the supervisor restarts, it will now run the v0.36.0 version, which is now linked to the deploy tag. Does this make sense? I have not yet done this in a production system, just explaining the design we consider. Happy for any suggestion on how to make that cleaner.

@b00f
Copy link
Author

b00f commented Jul 11, 2019

Yup, It makes sense perfectly. Unfortunately upgradability is the last thing developers think about. Devils are always hide themselves inside the code.

@ethanfrey Thank you for your time and explanation. I am still learning but I am trying my best! Let me know if you need any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants