Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: address snapshot/pruning refactor comments upstream #184

Merged
merged 46 commits into from
Apr 25, 2022

Conversation

p0mvn
Copy link
Member

@p0mvn p0mvn commented Apr 8, 2022

Description

Addressing functional and security comments from review upstream:
cosmos#11496

  • All relevant commits were backported from the PR above

Testing

Tested this on Osmosis v7.x: #206

  • had to backport to v7.x branches because the main is unstable
  • all commits in chore: backport snapshot / pruning refactor #206 are squashed into 1
  • pruning = everything and snapshot interval = 100
    • On prune-everything the interval and keep-recent are both 10

Following the Logs

If we specify a small snapshot-interval (such as 100) and snapshots overlap, the new requests to snapshot will be skipped if there is one that is already in progress. Therefore, I split the snapshot height tests into 2 cases:

  • snapshot performed
  • snapshot skipped
Snapshot Heights - Snapshot Performed

Verified that it works as expected by exposing some debug logs:

When we would normally prune height 4080000 (at 4080010), the snapshot height does not show up anymore because it is kept until after a snapshot is done:

9:49PM INF pruning heights heights=[4080001,4080002,4080003,4080004,4080005,4080006,4080007,4080008,4080009]
  • note that we have only 9 heights when the interval is 10 (4080000 is a snapshot height)

Eventually we see the following log message signifying that a snapshot height was pruned away:

HandleHeightSnapshot height=4080000
10:19PM INF completed state snapshot format=1 height=4080000
10:20PM INF pruning heights heights=[4080280,4080281,4080282,4080283,4080284,4080000,4080285,4080286,4080287,4080288,4080289]
Snapshot Heights - Skipped

Snapshot heights that overlap with an already in progress snapshot are skipped and pruned almost immediately:

9:58PM INF creating state snapshot height=4080100
9:58PM INF HandleHeightSnapshot height=4080100
9:58PM ERR failed to create state snapshot err="a snapshot operation is in progress: conflict" height=4080100
9:58PM INF indexed block height=4080100 module=txindex
10:00PM INF pruning heights heights=[4080101,4080100,4080102,4080103,4080104,4080105,4080106,4080107,4080108,4080109]
  • note that we have 10 heights when the interval is 10, snapshot is skipped
  • also note that 4080100 is out of order because it was added to the list in a separate goroutine after we checked for an in progress snapshot.
Regular Heights

For heights that are not multiples of snapshot-interval are removed right away according to the pruning strategy:

9:50PM INF pruning heights heights=[4080010,4080011,4080012,4080013,4080014,4080015,4080016,4080017,4080018,4080019]
  • note that we have 10 heights when the interval is 10 (none of these heights are snapshot heights)

Querying

I attempted to query epochs at a specific height. As soon as the queried height went beyond keep-recent, I would get an error. Ths indicates that pruning functions correctly.

Flow of events:

osmosisd query epochs current-epoch day --height 4080135
current_epoch: "306"
  • Observe 4080145 reached in the block explorer (4080135 + 10)
  • Observe 10:03PM INF pruning heights heights=[4080130,4080131,4080132,4080133,4080134,4080135,4080136,4080137,4080138,4080139] in the logs
osmosisd query epochs current-epoch day --height 4080135
Error: rpc error: code = InvalidArgument desc = not available identifier: invalid request

@p0mvn p0mvn changed the title refactor: address snapshot/pruning refactor upstream comments refactor: address snapshot/pruning refactor comments upstream Apr 8, 2022
@p0mvn p0mvn marked this pull request as ready for review April 8, 2022 21:20
@p0mvn p0mvn requested a review from alexanderbez as a code owner April 8, 2022 21:20
@p0mvn p0mvn requested review from alexanderbez and ValarDragon and removed request for alexanderbez April 8, 2022 21:23
pruning/export_test.go Show resolved Hide resolved
pruning/manager.go Outdated Show resolved Hide resolved
@@ -314,6 +314,11 @@ func (m *Manager) shouldTakeSnapshot(height int64) bool {
func (m *Manager) snapshot(height int64) {
m.logger.Info("creating state snapshot", "height", height)

if height < 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be <= 0 (I asked based on the error message)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will get that changed

pruning/export_test.go Show resolved Hide resolved
@@ -59,13 +62,14 @@ func (m *Manager) GetPruningHeights() []int64 {

// ResetPruningHeights resets the heights to be pruned.
func (m *Manager) ResetPruningHeights() {
m.pruneHeights = make([]int64, 0)
// reuse previously allocated memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// reuse previously allocated memory.
// reuse previously allocated memory

}
}

// TODO: convert to a generic version with Go 1.18.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This this really necessary? How would this change?

If we must do this, let's reference (and create) an issue. I'd like to avoid "naked" TODOs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right. It's not really necessary unless we start reusing this functionality with other types. I removed the comment

@@ -0,0 +1,11 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, what's this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, I've been switching frequently between this repo and the SDK repo, likely merged by accident

will ve removed

@@ -0,0 +1,5 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this file too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed

@p0mvn
Copy link
Member Author

p0mvn commented Apr 11, 2022

@alexanderbez thanks for the review. Will get back to your comments. I might have to update this PR with more changes from the comments in the upstream so marking this as a draft for now

@p0mvn p0mvn marked this pull request as ready for review April 14, 2022 02:55
@p0mvn
Copy link
Member Author

p0mvn commented Apr 14, 2022

All comments are now addressed. I backported all changes from the upstream up until: #187

I will do everything including and after it in a separate PR to limit the scope here

Copy link
Collaborator

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing both of these PRs in the SDK and our fork is not tenable IMO. Can we wait till the SDK one is merged or vice versa? I left a bunch of comments in the core repo.

snapshotInterval uint64
pruneHeights []int64
pruneSnapshotHeights *list.List
mx sync.Mutex
}

const (
pruneHeightsKey = "s/pruneheights"
pruneSnapshotHeightsKey = "s/pruneSnheights"
errNegativeHeightsFmt = "failed to get pruned heights: %d"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a string. It should not start with err

@p0mvn
Copy link
Member Author

p0mvn commented Apr 14, 2022

Reviewing both of these PRs in the SDK and our fork is not tenable IMO. Can we wait till the SDK one is merged or vice versa? I left a bunch of comments in the core repo.

Sounds good. I agree. Let's wait until the core is accepted

@p0mvn p0mvn marked this pull request as draft April 20, 2022 15:44
@p0mvn p0mvn force-pushed the roman/refactor-fixes branch 2 times, most recently from 4fcdd02 to cf03c8e Compare April 20, 2022 20:35
@p0mvn
Copy link
Member Author

p0mvn commented Apr 21, 2022

Hi @alexanderbez This is a backport of all commits in cosmos#11496
It's caught up to all the changes. All comments in this PR should also be addressed

Can we merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants