Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SLM Policy Queries #79341

Conversation

original-brownbear
Copy link
Member

Same as #79321 but for SLM policies. Enhances RepositoryData accordingly
to enable the optimization.

Same as elastic#79321 but for SLM policies. Enhances RepositoryData accordingly
to enable the optimization.
@original-brownbear original-brownbear added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Oct 18, 2021
@original-brownbear original-brownbear marked this pull request as ready for review October 18, 2021 09:57
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Oct 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok but I think we can simplify those nested higher-order generic types a bit.

@@ -156,7 +156,8 @@ public void testEnforcedCooldownPeriod() throws IOException {
SnapshotState.SUCCESS,
SnapshotsService.SHARD_GEN_IN_REPO_DATA_VERSION.minimumCompatibilityVersion(),
0L, // -1 would refresh RepositoryData and find the real version
0L // -1 would refresh RepositoryData and find the real version
0L, // -1 would refresh RepositoryData and find the real version,
null // null would refresh RepositoryData and find the real version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests that this shouldn't be null.

@@ -666,7 +666,7 @@ private static SnapshotsInRepo sortSnapshots(
}
}

private static Predicate<SnapshotInfo> filterBySLMPolicies(String[] slmPolicies) {
private static Tuple<Predicate<SnapshotInfo>, BiPredicate<SnapshotId, RepositoryData>> filterBySLMPolicies(String[] slmPolicies) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple<Predicate<SnapshotInfo>, BiPredicate<SnapshotId, RepositoryData>> is getting a bit convoluted, really it's just the following but without the names or anywhere to hang docs:

interface SomeName {
  boolean method1(SnapshotInfo snapshotInfo);
  boolean method2(SnapshotId snapshotId, RepositoryData repositoryData);
}

I think the names & docs would be useful, let's make this a thing. Maybe we should be subclassing SnapshotPredicates rather than having these higher-order fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I did this for this specific case, now but I agree, it's a worthwhile cleanup all over :) I'll do it in a follow-up after FF

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I'm assuming we're ok with adding an extra field to RepositoryData here)

@original-brownbear
Copy link
Member Author

Thanks David!

(I'm assuming we're ok with adding an extra field to RepositoryData here)

I think that's just fine. The extra bytes in the serialized objects hardly matter and on heap we're deduplicating the strings so they don't matter either. The performance gain when filtering get-snapshots by slm policy on the other hand is quite significant (particularly for the expected case of listing hundreds of backups but maybe also having thousands of ILM created frozen tier snapshots in the repo).

@original-brownbear original-brownbear merged commit 12e2f55 into elastic:master Oct 19, 2021
@original-brownbear original-brownbear deleted the slm-store-policy-repo-data branch October 19, 2021 09:53
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 19, 2021
Same as elastic#79321 but for SLM policies. Enhances RepositoryData accordingly
to enable the optimization.
original-brownbear added a commit that referenced this pull request Oct 19, 2021
Same as #79321 but for SLM policies. Enhances RepositoryData accordingly
to enable the optimization.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 19, 2021
* upstream/master:
  Validate tsdb's routing_path (elastic#79384)
  Adjust the BWC version for the return200ForClusterHealthTimeout field (elastic#79436)
  API for adding and removing indices from a data stream (elastic#79279)
  Exposing the ability to log deprecated settings at non-critical level (elastic#79107)
  Convert operator privilege license object to LicensedFeature (elastic#79407)
  Mute SnapshotBasedIndexRecoveryIT testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79456)
  Create cache files with CREATE_NEW & SPARSE options (elastic#79371)
  Revert "[ML] Use a new annotations index for future annotations (elastic#79151)"
  [ML] Use a new annotations index for future annotations (elastic#79151)
  [ML] Removing legacy code from ML/transform auditor (elastic#79434)
  Fix rate agg with custom `_doc_count` (elastic#79346)
  Optimize SLM Policy Queries (elastic#79341)
  Fix execution of exists query within nested queries on field with doc_values disabled (elastic#78841)
  Stricter UpdateSettingsRequest parsing on the REST layer (elastic#79227)
  Do not release snapshot file download permit during recovery retries (elastic#79409)
  Preserve request headers in a mixed version cluster (elastic#79412)
  Adjust versions after elastic#79044 backport to 7.x (elastic#79424)
  Mute BulkByScrollUsesAllScrollDocumentsAfterConflictsIntegTests (elastic#79429)
  Fail on SSPL licensed x-pack sources (elastic#79348)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 19, 2021
`SnapshotPredicate` and `SnapshotPredicates` do almost the same thing
and the way that the `SnapshotPredicates` is constructed is quite
convoluted. This commit combines the two classes into one and simplifies
the construction logic.

Relates elastic#79341
DaveCTurner added a commit that referenced this pull request Oct 20, 2021
`SnapshotPredicate` and `SnapshotPredicates` do almost the same thing
and the way that the `SnapshotPredicates` is constructed is quite
convoluted. This commit combines the two classes into one and simplifies
the construction logic.

Relates #79341
DaveCTurner added a commit that referenced this pull request Oct 20, 2021
`SnapshotPredicate` and `SnapshotPredicates` do almost the same thing
and the way that the `SnapshotPredicates` is constructed is quite
convoluted. This commit combines the two classes into one and simplifies
the construction logic.

Relates #79341
@original-brownbear original-brownbear restored the slm-store-policy-repo-data branch April 18, 2023 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants