Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit concurrent snapshot file restores in recovery per node #79316

Merged
merged 5 commits into from
Oct 18, 2021

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Oct 17, 2021

Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new settingmax_concurrent_snapshot_file_downloads_per_node
that allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot. This should cover most cases and protect the rest of the
workloads that use the same repository when the node_concurrent_recoveries
is larger than the default.

Closes #79044

… recoveries

Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exahust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes elastic#79044
@fcofdez fcofdez force-pushed the limit-concurrent-recovery-connections branch from 0514602 to 8a8b13d Compare October 18, 2021 06:39
@fcofdez fcofdez added v7.16.0 Team:Distributed Meta label for distributed team :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Oct 18, 2021
@fcofdez fcofdez marked this pull request as ready for review October 18, 2021 07:35
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some small comments & suggestions.

@@ -161,7 +162,7 @@
private volatile TimeValue internalActionRetryTimeout;
private volatile TimeValue internalActionLongTimeout;
private volatile boolean useSnapshotsDuringRecovery;
private volatile int maxConcurrentSnapshotFileDownloads;
private volatile int getMaxConcurrentSnapshotFileDownloads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Looks like a rename refactoring was a bit overzealous here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

@@ -138,9 +141,17 @@ public void beforeIndexShardClosed(ShardId shardId, @Nullable IndexShard indexSh
}

public void startRecovery(final IndexShard indexShard, final DiscoveryNode sourceNode, final RecoveryListener listener) {
final Releasable snapshotFileDownloadsPermit =
recoverySnapshotFileDownloadsThrottler.tryAcquire(recoverySettings.getMaxConcurrentSnapshotFileDownloads());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail to acquire permits then we should log a warning, indicating that the user should reduce cluster.routing.allocation.node_concurrent_recoveries to be at most indices.recovery.max_concurrent_snapshot_file_downloads / indices.recovery.max_concurrent_snapshot_file_downloads_per_node.

Relatedly it doesn't make sense for indices.recovery.max_concurrent_snapshot_file_downloads_per_node to be less than indices.recovery.max_concurrent_snapshot_file_downloads, should we validate that?

Also this change would let us respect indices.recovery.use_snapshots on the target, simply by not even trying to acquire permits if indices.recovery.use_snapshots is false.

(also the Javadoc for indices.recovery.use_snapshots indicates that it defaults to false but it actually defaults to true).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ed9c4ef

import org.elasticsearch.core.Releasable;
import org.elasticsearch.core.Releasables;

public class RecoverySnapshotFileDownloadsThrottler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fold most of this class into RecoverySettings? I think it'd be ok just to have a RecoverySettings#tryAcquireSnapshotDownloadPermits method, or if you prefer you can expose a wrapper like we do with rateLimiter().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ed9c4ef

@@ -31,6 +31,7 @@
private Store.MetadataSnapshot metadataSnapshot;
private boolean primaryRelocation;
private long startingSeqNo;
private boolean hasPermitsToDownloadSnapshotFiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's just call this canDownloadSnapshotFiles, there may be other reasons it can't (e.g. indices.recovery.use_snapshots is false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ed9c4ef

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 18, 2021

@elasticmachine run elasticsearch-ci/bwc
It was a known test failure

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments/questions about respecting the use of snapshots on the source node too and everything else is just tiny things.

@@ -127,7 +126,7 @@

public RecoverySourceHandler(IndexShard shard, RecoveryTargetHandler recoveryTarget, ThreadPool threadPool,
StartRecoveryRequest request, int fileChunkSizeInBytes, int maxConcurrentFileChunks,
int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads, boolean useSnapshots,
int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I sort of see that it doesn't make sense to use the setting on the source node, but in the BwC case we treat the target as if it can use snapshots, is this safe?

if (snapshotFileDownloadsPermit == null) {
logger.warn(String.format(Locale.ROOT,
"Unable to acquire permit to use snapshot files during recovery, this recovery will recover from the source node. " +
"[%s] should have the same value as [%s]/[%s]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit is only an upper bound, you could have fewer concurrent recoveries, but also I'd suggest just saying the number rather than naming the settings since otherwise folk will just increase max_concurrent_snapshot_file_downloads_per_node and run into bigger problems when they run out of HTTP connections.

Suggested change
"[%s] should have the same value as [%s]/[%s]",
"Ensure snapshot files can be used during recovery by setting [%s] to be no greater than [%d]",

@@ -67,6 +67,8 @@
private final IndexShard indexShard;
private final DiscoveryNode sourceNode;
private final SnapshotFilesProvider snapshotFilesProvider;
@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Nullable
@Nullable // if we're not downloading files from snapshots in this recovery

@@ -119,5 +132,8 @@ public void writeTo(StreamOutput out) throws IOException {
metadataSnapshot.writeTo(out);
out.writeBoolean(primaryRelocation);
out.writeLong(startingSeqNo);
if (out.getVersion().onOrAfter(RecoverySettings.SNAPSHOT_FILE_DOWNLOAD_THROTTLING_SUPPORTED_VERSION)) {
out.writeBoolean(canDownloadSnapshotFiles);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to drop this value no matter whether it's true or false when dealing with an older node? I worry that we might have some trouble from this lenience, plus the fact that it defaults to true if missing and that we no longer care about the setting on the source node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, maybe we should keep the check for indices.recovery.use_snapshots in the source node too? that way we would keep the current behaviour in a mixed-version cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that'd be best.

@@ -714,12 +714,17 @@ protected Node(final Environment initialEnvironment,
clusterService
);
final RecoveryPlannerService recoveryPlannerService = new SnapshotsRecoveryPlannerService(shardSnapshotsService);
final SnapshotFilesProvider snapshotFilesProvider =
new SnapshotFilesProvider(repositoryService);
final SnapshotFilesProvider snapshotFilesProvider = new SnapshotFilesProvider(repositoryService);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can revert these changes now, they're only whitespace/import reordering right?

indicesClusterStateService = new IndicesClusterStateService(
settings,
indicesService,
clusterService,
threadPool,
new PeerRecoveryTargetService(threadPool, transportService, recoverySettings, clusterService, snapshotFilesProvider),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, this change isn't needed any more.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 18, 2021

@elasticmachine run elasticsearch-ci/part-1
Unrelated failure

@fcofdez fcofdez added the auto-backport Automatically create backport pull requests when merged label Oct 18, 2021
@fcofdez fcofdez merged commit 2b4fe8f into elastic:master Oct 18, 2021
@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 18, 2021

Thanks David!

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 79316

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 18, 2021
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes elastic#79044
Backport of elastic#79316
@jtibshirani
Copy link
Contributor

I just noticed a couple test failures that could be related:

These don't reproduce for me locally.

fcofdez added a commit that referenced this pull request Oct 19, 2021
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes #79044
Backport of #79316
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 19, 2021
fcofdez added a commit that referenced this pull request Nov 29, 2021
If we don't cancel the re-location of the index to the same target
node, it is possible that the recovery is retried, meaning that it's
possible that the available permit is granted to indexRecoveredFromSnapshot1
instead of to indexRecoveredFromSnapshot2.

Relates #79316
Closes #79420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit the number of connections used by snapshot file downloads during recoveries
7 participants