Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Next Field in Paginated GetSnapshots Response #74236

Merged

Conversation

original-brownbear
Copy link
Member

Follow up to #73952 adding documentation for the after query parameter
and the related next response field.

Follow up to elastic#73952 adding documentation for the `after` query parameter
and the related `next` response field.
@original-brownbear original-brownbear added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.14.0 labels Jun 17, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jun 17, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

}
}
],
"next": "snapshot_2,snapshot_2"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just not allow pagination when doing a query across multiple repos. The current approach of doing pagination per repo becomes really confusing when multiple next values are returned but only one after can be set. Maybe this doesn't matter too much in the short-run as it's an 8.x only issue anyway but for `8 we should have something nicer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that does seem suboptimal, I think it would be useful to have pagination even if multiple repos are involved. Why do we need to return multiple next values? Since we're using the same sort order for all of them can we not use the minimum of the next values? Relatedly, does this mean that the limit applies to each repository separately rather than across all of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, does this mean that the limit applies to each repository separately rather than across all of them?

Yes, currently that's how it's implemented (mainly because of the focus on getting this out the door for 7.x where this is a non-concern because we only work with a single repo at a time and I didn't want to create different logic for master and 7.x for now to keep the backport reasonably simple).

Since we're using the same sort order for all of them can we not use the minimum of the next values?
Why do we need to return multiple next values?

We can implement pagination globally across all repos but then the logic for master will diverge a lot from 7.x which I'd rather not do right now. If we don't do that though, I don't see an alternative to either forbidding pagination with multiple repos in the query or having multiple next values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, forbidding it seems the most future-proof answer today, it's easy to become more permissive in future but much harder to change something that works into something else that works differently.

I think it'd be better for the next field to be at the top level of the response tho rather than having it inside the (single) snapshot object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problems/confusion here has been resolved last week in #74451. We now have just a single next field at the top level and snapshots are paginated across repositories :)

@original-brownbear
Copy link
Member Author

@DaveCTurner @henningandersen sorry for the noise in this one, now that #74451 has been merged this one should be good to review. I added the next parameter at the top level now and it works across repositories (admittedly not yet in a super efficient manner but the API should behave well :)).

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, looking good otherwise.

final String[] parts = param.split(",");
if (parts.length != 2) {
throw new IllegalArgumentException(
"after param must be of the form ${sort_value},${snapshot_name} but was [" + param + "]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not include the repo as a tiebreaker too? I.e., the format is sort_value,repo_name,snapshot_name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ will do

this.snapshots = List.copyOf(snapshots);
this.failures = failures == null ? Map.of() : Map.copyOf(failures);
this.next = next;
}

public GetSnapshotsResponse(StreamInput in) throws IOException {
this.snapshots = in.readList(SnapshotInfo::readFrom);
if (in.getVersion().onOrAfter(GetSnapshotsRequest.MULTIPLE_REPOSITORIES_SUPPORT_ADDED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you will backport both this and the multiple repo PR in one go then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on whether this goes in today I guess with FF incoming :) If not I'll adjust the PR and backport separately.

NOTE: The pagination parameters `size`, `order`, and `sort` are not supported when using `verbose=false` and the sort order for
`after`::
(Optional, string)
Offset identifier to start pagination from as returned by the `next` field in the response body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth spending a few lines explaining that the underlying list can change and how pagination works in case of deletes and additions during pagination (i.e., you are sure to not miss any that were in the list originally and have not been deleted during pagination and you may see snapshots added during pagination).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ added a few lines on this

@@ -119,7 +119,7 @@ private GetSnapshotsResponse createTestInstance() {
failures.put(repository, new ElasticsearchException(randomAlphaOfLength(10)));
}

return new GetSnapshotsResponse(responses, failures);
return new GetSnapshotsResponse(responses, failures, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we sometimes need to add a random next value here? Also, assertEqualInstances should be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}

private final List<SnapshotInfo> snapshots;

private final Map<String, ElasticsearchException> failures;

public GetSnapshotsResponse(List<SnapshotInfo> snapshots, Map<String, ElasticsearchException> failures) {
@Nullable
private final String next;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update equals and hashCode to account for the new field too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing this properly but had one comment straight away regarding encoding.


[source,console]
----
GET /_snapshot/my_repository/snapshot*?size=2&sort=name&after=snapshot_2,snapshot_2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry we will have users running into encoding problems with this API, since we permit characters like & and % in snapshot names and the effects of failing to encode this parameter properly might be pretty subtle. Could we base64-encode this parameter to avoid all that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ will do :)

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/rest-compatibility (git timeout)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks ok to me but I'll let Henning have the final say. I left more suggestions re. the encoding.

final String[] parts = new String(Base64.getDecoder().decode(param), StandardCharsets.UTF_8).split(",");
if (parts.length != 3) {
throw new IllegalArgumentException(
"after param must be base64 encoded and of the form ${sort_value},${repository_name},${snapshot_name} but was ["
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to spell out the format here IMO

Suggested change
"after param must be base64 encoded and of the form ${sort_value},${repository_name},${snapshot_name} but was ["
"invalid ?after parameter ["

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@original-brownbear
Copy link
Member Author

Thanks @henning and @DaveCTurner! all points addressed now I think :)

@henning

This comment has been minimized.

@original-brownbear
Copy link
Member Author

Sorry @henning wrong Henning. Ment to ping @henningandersen

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think next as a string and after as an object in response/request is slightly inconsistent, but resolving that can be done in a followup rather than in this PR. I think after should turn into a String too, making it opaque at the transport layer too.

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/part-1 (unrelated)

@original-brownbear
Copy link
Member Author

Thanks Henning + David!

@original-brownbear original-brownbear merged commit 5f89f8b into elastic:master Jun 28, 2021
@original-brownbear original-brownbear deleted the next-field-get-snapshots branch June 28, 2021 20:28
original-brownbear added a commit that referenced this pull request Jun 29, 2021
)

Backport of the recently introduced snapshot pagination and scalability improvements listed below.
Merged as a single backport because the `7.x` and master snapshot status API logic had massively diverged between master and 7.x. With the work in the below PRs, the logic in master and 7.x once again has been aligned very closely again.

#72842
#73172
#73199
#73570 
#73952
#74236 
#74451 (this one is only partly applicable as it was mainly a change to master to align `master` and `7.x` branches)
@original-brownbear original-brownbear restored the next-field-get-snapshots branch April 18, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants