Implement single node downgrades #13405

serathius · 2021-10-08T10:58:34Z

This PR implement single downgrades as proposed in https://docs.google.com/document/d/1yD0GDkxqWBPAax6jLZ97clwAz2Gp0Gux6xaTrtJ6wHE/edit?usp=sharing with the goal of introducing e2e tests that can confirm that storage versioning properly validates WAL entries during downgrade.

This doesn't mean that with this PR etcd supports downgrades, there are still a lot of testing, small problems that we need to fix before we can say that downgrades are safe. This is meant to allow us to expand testing of downgrades with different scenarios to confirm it's reliability.

Problem detected during implementation that will need to be fixed:

As Etcd v3.5 imminently panics on SetClusterVersion with version set "3.6" entry in WAL I added it to MinEtcdVersion logic. We should consider adding logic to etcdutl migrate to drop this entry.
Sometimes setting ClusterVersion after upgrade timeouts, we should debug why.
I didn't implement snapshoting WAL after lowering cluster version (required to remove non-backward compatible entries). To work around I use very low snapshot count in tests.

cc @ptabor @lilic

server/etcdserver/api/membership/cluster.go

tests/e2e/cluster_downgrade_test.go

server/etcdserver/version/monitor.go

ptabor · 2021-10-14T12:21:56Z

server/storage/storage.go

+		walsnap.Term = sn.Metadata.Term
+		walsnap.ConfState = &sn.Metadata.ConfState
+	}
+	w, err := st.w.Reopen(st.lg, walsnap)


Why this call needs to reopen 'w' while the other calls keep working on the same WAL ?

It's counter intuitive that a 'getter like method' is performing mutations.

Fixed getter.

This is tricky so let me know what would be simplest way to implement it. Based on documentation in comments, WAL can be either in read or write mode, it starts in read mode and when all entries are read it switches back to write mode. Problem is that during etcd runtime, WAL is in write mode, but to verify possibility of downgrades we need to switch it back to read mode.

What I did here is basically lock access to WAL, reopen it from last snapshot and read all entries to make it writable back again. Please let me know if there is a better way to reread entries in WAL.

This seems to be a simillar problem to this method:

etcd/server/storage/wal/wal.go

Line 621 in ef1f71a

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {

?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

I agree with that and I was already experimenting with when working on static analysis of WAL annotation. However I would definitely want to keep this PR focused on downgrades and do this refactor as a separate PR.

ptabor

It looks good to me. Thank you.
A few clarification questions in the comments.

We should stabilize tests (as the situation looks worse then usually) before submitting such logical changes.

Please also modify PR description as its not only about the tests.

serathius · 2021-10-19T14:30:02Z

I found a deadlock in current downgrade implementation, fixed it so the tests should pass.

ptabor · 2021-10-21T15:26:34Z

server/storage/storage.go

+		walsnap.Term = sn.Metadata.Term
+		walsnap.ConfState = &sn.Metadata.ConfState
+	}
+	w, err := st.w.Reopen(st.lg, walsnap)


This seems to be a simillar problem to this method:

etcd/server/storage/wal/wal.go

Line 621 in ef1f71a

func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error) {

?

Maybe we can generalize it to let it take 'Listener Interface' (visitor like pattern) that
either performs 'Verification' or computes minimal version ?

server/etcdserver/version/monitor.go

hexfusion · 2021-10-21T15:43:12Z

Could I please have the weekend to review this before it merges? It looks great in general I just have not had the time to look through it completely. Thanks again for the hard work.

ptabor · 2021-10-22T09:26:14Z

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change
https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true
or independent...

The test fails with:

...
    ordering_util_test.go:77: While speaking to partitioned leader, we should get ErrNoGreaterRev error
...

serathius · 2021-10-26T11:37:11Z

@hexfusion Did you have time to take a look?

hexfusion

One question otherwise lgtm

server/etcdserver/version/monitor.go

Problem with old code was that during downgrade only members with downgrade target version were allowed to join. This is unrealistic as it doesn't handle any members to disconnect/rejoin.

…her version This is because etcd v3.5 will panic when it encounters ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to work we need to make sure this entry is snapshotted.

By validating if WAL doesn't include any incompatible entries we can implement storage downgrades.

serathius · 2021-10-29T10:55:10Z

I wonder whether flakes of TestEndpointSwitchResolvesViolation are correlated with the change https://github.com/etcd-io/etcd/runs/3966166854?check_suite_focus=true or independent...

Run test alone 10 times without any failures. Don't think there is correlation, but maybe its also correlated with other test parameters (parallel execution with --cpu etcd)

serathius · 2021-10-29T12:11:14Z

Grpc failure looks like a flake

tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease --run TestLeaseWithRequireLeader -timeout=5m -tags cluster_proxy --race=true --cpu=4 --count 10
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     6.798s
tests $ go test   go.etcd.io/etcd/tests/v3/integration/clientv3/lease  -timeout=5m -tags cluster_proxy --race=true --cpu=4 
ok      go.etcd.io/etcd/tests/v3/integration/clientv3/lease     118.937s

ptabor · 2021-10-29T21:22:06Z

Thank you. Merging.

serathius force-pushed the downgrade-b branch from 9747e54 to 0507740 Compare October 8, 2021 11:10

serathius requested a review from ptabor October 8, 2021 12:31

serathius force-pushed the downgrade-b branch 5 times, most recently from 83c7227 to d3d264f Compare October 11, 2021 10:51

serathius mentioned this pull request Oct 11, 2021

Implement storage versioning #13168

Open

13 tasks

serathius force-pushed the downgrade-b branch 3 times, most recently from 0e958f3 to c2d1582 Compare October 11, 2021 14:30

serathius changed the title ~~Implement single node downgrades~~ Implement single node downgrades tests Oct 11, 2021

serathius force-pushed the downgrade-b branch 2 times, most recently from 7d59020 to 530ad01 Compare October 14, 2021 12:14

ptabor reviewed Oct 14, 2021

View reviewed changes

serathius changed the title ~~Implement single node downgrades tests~~ Implement single node downgrades Oct 14, 2021

serathius force-pushed the downgrade-b branch 2 times, most recently from 87874b8 to ad31f7b Compare October 15, 2021 14:34

serathius force-pushed the downgrade-b branch 8 times, most recently from 307ff26 to 7a5e622 Compare October 21, 2021 15:29

ptabor approved these changes Oct 21, 2021

View reviewed changes

serathius force-pushed the downgrade-b branch from 7a5e622 to 530da33 Compare October 21, 2021 15:59

hexfusion approved these changes Oct 27, 2021

View reviewed changes

server/etcdserver/version/monitor.go Show resolved Hide resolved

serathius added 6 commits October 29, 2021 12:47

server: Depend only on cluster version to detect downgrade

758fc0f

Problem with old code was that during downgrade only members with downgrade target version were allowed to join. This is unrealistic as it doesn't handle any members to disconnect/rejoin.

server: Detect when WAL includes unapplied cluster version set to hig…

f5d71fa

…her version This is because etcd v3.5 will panic when it encounters ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to work we need to make sure this entry is snapshotted.

server: Use server version to decide if to downgrade has finished

335dc98

server: Implement storage downgrades

431adc5

By validating if WAL doesn't include any incompatible entries we can implement storage downgrades.

tests: Add e2e tests for downgrades

6c2be08

server: Remove lock from adapter to avoid deadlock

9d47a97

serathius force-pushed the downgrade-b branch from 530da33 to 9d47a97 Compare October 29, 2021 10:55

ptabor merged commit 6c2f5dc into etcd-io:main Oct 29, 2021

serathius deleted the downgrade-b branch June 15, 2023 20:39

siyuanfoundation mentioned this pull request May 3, 2024

[3.5] Backport cluster downgrade test. #17931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement single node downgrades #13405

Implement single node downgrades #13405

serathius commented Oct 8, 2021

ptabor Oct 14, 2021

serathius Oct 19, 2021

ptabor Oct 21, 2021

serathius Oct 29, 2021

ptabor left a comment

serathius commented Oct 19, 2021

ptabor Oct 21, 2021

hexfusion commented Oct 21, 2021

ptabor commented Oct 22, 2021

serathius commented Oct 26, 2021

hexfusion left a comment

serathius commented Oct 29, 2021

serathius commented Oct 29, 2021

ptabor commented Oct 29, 2021

Implement single node downgrades #13405

Implement single node downgrades #13405

Conversation

serathius commented Oct 8, 2021

ptabor Oct 14, 2021

Choose a reason for hiding this comment

serathius Oct 19, 2021

Choose a reason for hiding this comment

ptabor Oct 21, 2021

Choose a reason for hiding this comment

serathius Oct 29, 2021

Choose a reason for hiding this comment

ptabor left a comment

Choose a reason for hiding this comment

serathius commented Oct 19, 2021

ptabor Oct 21, 2021

Choose a reason for hiding this comment

hexfusion commented Oct 21, 2021

ptabor commented Oct 22, 2021

serathius commented Oct 26, 2021

hexfusion left a comment

Choose a reason for hiding this comment

serathius commented Oct 29, 2021

serathius commented Oct 29, 2021

ptabor commented Oct 29, 2021