Limit concurrent snapshot file restores in recovery per node #79316

fcofdez · 2021-10-17T13:46:52Z

Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new settingmax_concurrent_snapshot_file_downloads_per_node
that allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot. This should cover most cases and protect the rest of the
workloads that use the same repository when the node_concurrent_recoveries
is larger than the default.

Closes #79044

… recoveries Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exahust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes elastic#79044

elasticmachine · 2021-10-18T07:35:42Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Looks good, I left some small comments & suggestions.

DaveCTurner · 2021-10-18T07:45:38Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySettings.java

@@ -161,7 +162,7 @@
    private volatile TimeValue internalActionRetryTimeout;
    private volatile TimeValue internalActionLongTimeout;
    private volatile boolean useSnapshotsDuringRecovery;
-    private volatile int maxConcurrentSnapshotFileDownloads;
+    private volatile int getMaxConcurrentSnapshotFileDownloads;


nit: Looks like a rename refactoring was a bit overzealous here?

DaveCTurner · 2021-10-18T07:54:32Z

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

@@ -138,9 +141,17 @@ public void beforeIndexShardClosed(ShardId shardId, @Nullable IndexShard indexSh
    }

    public void startRecovery(final IndexShard indexShard, final DiscoveryNode sourceNode, final RecoveryListener listener) {
+        final Releasable snapshotFileDownloadsPermit =
+            recoverySnapshotFileDownloadsThrottler.tryAcquire(recoverySettings.getMaxConcurrentSnapshotFileDownloads());


If we fail to acquire permits then we should log a warning, indicating that the user should reduce cluster.routing.allocation.node_concurrent_recoveries to be at most indices.recovery.max_concurrent_snapshot_file_downloads / indices.recovery.max_concurrent_snapshot_file_downloads_per_node.

Relatedly it doesn't make sense for indices.recovery.max_concurrent_snapshot_file_downloads_per_node to be less than indices.recovery.max_concurrent_snapshot_file_downloads, should we validate that?

Also this change would let us respect indices.recovery.use_snapshots on the target, simply by not even trying to acquire permits if indices.recovery.use_snapshots is false.

(also the Javadoc for indices.recovery.use_snapshots indicates that it defaults to false but it actually defaults to true).

Addressed in ed9c4ef

DaveCTurner · 2021-10-18T08:02:56Z

...src/main/java/org/elasticsearch/indices/recovery/RecoverySnapshotFileDownloadsThrottler.java

+import org.elasticsearch.core.Releasable;
+import org.elasticsearch.core.Releasables;
+
+public class RecoverySnapshotFileDownloadsThrottler {


Could we fold most of this class into RecoverySettings? I think it'd be ok just to have a RecoverySettings#tryAcquireSnapshotDownloadPermits method, or if you prefer you can expose a wrapper like we do with rateLimiter().

Addressed in ed9c4ef

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

DaveCTurner · 2021-10-18T08:09:58Z

server/src/main/java/org/elasticsearch/indices/recovery/StartRecoveryRequest.java

@@ -31,6 +31,7 @@
    private Store.MetadataSnapshot metadataSnapshot;
    private boolean primaryRelocation;
    private long startingSeqNo;
+    private boolean hasPermitsToDownloadSnapshotFiles;


nit: let's just call this canDownloadSnapshotFiles, there may be other reasons it can't (e.g. indices.recovery.use_snapshots is false)

Addressed in ed9c4ef

fcofdez · 2021-10-18T11:39:45Z

@elasticmachine run elasticsearch-ci/bwc
It was a known test failure

DaveCTurner

I left a couple of comments/questions about respecting the use of snapshots on the source node too and everything else is just tiny things.

DaveCTurner · 2021-10-18T13:47:07Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

@@ -127,7 +126,7 @@

    public RecoverySourceHandler(IndexShard shard, RecoveryTargetHandler recoveryTarget, ThreadPool threadPool,
                                 StartRecoveryRequest request, int fileChunkSizeInBytes, int maxConcurrentFileChunks,
-                                 int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads, boolean useSnapshots,
+                                 int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads,


Hmm I sort of see that it doesn't make sense to use the setting on the source node, but in the BwC case we treat the target as if it can use snapshots, is this safe?

DaveCTurner · 2021-10-18T13:50:06Z

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

+        if (snapshotFileDownloadsPermit == null) {
+            logger.warn(String.format(Locale.ROOT,
+                "Unable to acquire permit to use snapshot files during recovery, this recovery will recover from the source node. " +
+                    "[%s] should have the same value as [%s]/[%s]",


The limit is only an upper bound, you could have fewer concurrent recoveries, but also I'd suggest just saying the number rather than naming the settings since otherwise folk will just increase max_concurrent_snapshot_file_downloads_per_node and run into bigger problems when they run out of HTTP connections.

Suggested change

"[%s] should have the same value as [%s]/[%s]",

"Ensure snapshot files can be used during recovery by setting [%s] to be no greater than [%d]",

DaveCTurner · 2021-10-18T13:53:40Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

@@ -67,6 +67,8 @@
    private final IndexShard indexShard;
    private final DiscoveryNode sourceNode;
    private final SnapshotFilesProvider snapshotFilesProvider;
+    @Nullable


Suggested change

@Nullable

@Nullable // if we're not downloading files from snapshots in this recovery

DaveCTurner · 2021-10-18T13:59:40Z

server/src/main/java/org/elasticsearch/indices/recovery/StartRecoveryRequest.java

@@ -119,5 +132,8 @@ public void writeTo(StreamOutput out) throws IOException {
        metadataSnapshot.writeTo(out);
        out.writeBoolean(primaryRelocation);
        out.writeLong(startingSeqNo);
+        if (out.getVersion().onOrAfter(RecoverySettings.SNAPSHOT_FILE_DOWNLOAD_THROTTLING_SUPPORTED_VERSION)) {
+            out.writeBoolean(canDownloadSnapshotFiles);
+        }


Is it safe to drop this value no matter whether it's true or false when dealing with an older node? I worry that we might have some trouble from this lenience, plus the fact that it defaults to true if missing and that we no longer care about the setting on the source node.

That's a fair point, maybe we should keep the check for indices.recovery.use_snapshots in the source node too? that way we would keep the current behaviour in a mixed-version cluster

Yes I think that'd be best.

DaveCTurner · 2021-10-18T14:01:26Z

server/src/main/java/org/elasticsearch/node/Node.java

@@ -714,12 +714,17 @@ protected Node(final Environment initialEnvironment,
                            clusterService
                        );
                        final RecoveryPlannerService recoveryPlannerService = new SnapshotsRecoveryPlannerService(shardSnapshotsService);
-                        final SnapshotFilesProvider snapshotFilesProvider =
-                            new SnapshotFilesProvider(repositoryService);
+                        final SnapshotFilesProvider snapshotFilesProvider = new SnapshotFilesProvider(repositoryService);


I think we can revert these changes now, they're only whitespace/import reordering right?

DaveCTurner · 2021-10-18T14:02:14Z

server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java

                indicesClusterStateService = new IndicesClusterStateService(
                    settings,
                    indicesService,
                    clusterService,
                    threadPool,
-                    new PeerRecoveryTargetService(threadPool, transportService, recoverySettings, clusterService, snapshotFilesProvider),


Likewise here, this change isn't needed any more.

…covery-connections

DaveCTurner

LGTM

fcofdez · 2021-10-18T15:35:45Z

@elasticmachine run elasticsearch-ci/part-1
Unrelated failure

fcofdez · 2021-10-18T16:18:18Z

Thanks David!

elasticsearchmachine · 2021-10-18T16:18:46Z

💔 Backport failed

Status	Branch	Result
❌	7.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 79316

Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exhaust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes elastic#79044 Backport of elastic#79316

jtibshirani · 2021-10-18T21:04:09Z

I just noticed a couple test failures that could be related:

SnapshotBasedIndexRecoveryIT.testRecoveryUsingSnapshotsPermitIsReturnedAfterFailureOrCancellation (https://gradle-enterprise.elastic.co/s/63bzihjzcqavk)
IndexRecoveryIT.testRecoverLocallyUpToGlobalCheckpoint (https://gradle-enterprise.elastic.co/s/wf2m3hkpgaotg)

These don't reproduce for me locally.

Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exhaust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes #79044 Backport of #79316

…79409) Relates #79316

…lastic#79409) Relates elastic#79316

…79438) Relates #79316 Backport of #79409

If we don't cancel the re-location of the index to the same target node, it is possible that the recovery is retried, meaning that it's possible that the available permit is granted to indexRecoveredFromSnapshot1 instead of to indexRecoveredFromSnapshot2. Relates #79316 Closes #79420

elasticsearchmachine added the v8.0.0 label Oct 17, 2021

fcofdez force-pushed the limit-concurrent-recovery-connections branch from 0514602 to 8a8b13d Compare October 18, 2021 06:39

fcofdez added v7.16.0 Team:Distributed Meta label for distributed team :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Oct 18, 2021

fcofdez marked this pull request as ready for review October 18, 2021 07:35

fcofdez requested a review from DaveCTurner October 18, 2021 07:35

DaveCTurner reviewed Oct 18, 2021

View reviewed changes

fcofdez added 2 commits October 18, 2021 12:30

Review comments

ed9c4ef

Fix test

89ff1ed

fcofdez requested a review from DaveCTurner October 18, 2021 12:22

DaveCTurner reviewed Oct 18, 2021

View reviewed changes

fcofdez added 2 commits October 18, 2021 17:15

More review comments

7182bf7

Merge remote-tracking branch 'origin/master' into limit-concurrent-re…

5325c6e

…covery-connections

DaveCTurner approved these changes Oct 18, 2021

View reviewed changes

fcofdez added the auto-backport Automatically create backport pull requests when merged label Oct 18, 2021

fcofdez merged commit 2b4fe8f into elastic:master Oct 18, 2021

fcofdez added the backport pending label Oct 18, 2021

fcofdez mentioned this pull request Oct 18, 2021

[7.x] Limit concurrent snapshot file restores in recovery per node #79379

Merged

This was referenced Oct 19, 2021

Do not release snapshot file download permit during recovery retries #79409

Merged

Fix race condition in SnapshotBasedIndexRecoveryIT #79404

Merged

fcofdez removed the backport pending label Oct 19, 2021

fcofdez added a commit that referenced this pull request Oct 19, 2021

Do not release snapshot file download permit during recovery retries (#…

26fa3eb

…79409) Relates #79316

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 19, 2021

Do not release snapshot file download permit during recovery retries (e…

02cb9c8

…lastic#79409) Relates elastic#79316

fcofdez added a commit that referenced this pull request Oct 19, 2021

Do not release snapshot file download permit during recovery retries (#…

703d6c2

…79438) Relates #79316 Backport of #79409

fcofdez mentioned this pull request Oct 20, 2021

Replicate or relocate data via snapshot #73496

Closed

8 tasks

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

danhermann added the >enhancement label Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit concurrent snapshot file restores in recovery per node #79316

Limit concurrent snapshot file restores in recovery per node #79316

fcofdez commented Oct 17, 2021 •

edited

Loading

elasticmachine commented Oct 18, 2021

DaveCTurner left a comment

DaveCTurner Oct 18, 2021

fcofdez Oct 18, 2021

DaveCTurner Oct 18, 2021

fcofdez Oct 18, 2021

DaveCTurner Oct 18, 2021

fcofdez Oct 18, 2021

DaveCTurner Oct 18, 2021

fcofdez Oct 18, 2021

fcofdez commented Oct 18, 2021

DaveCTurner left a comment

DaveCTurner Oct 18, 2021

DaveCTurner Oct 18, 2021

DaveCTurner Oct 18, 2021

DaveCTurner Oct 18, 2021

fcofdez Oct 18, 2021

DaveCTurner Oct 18, 2021

DaveCTurner Oct 18, 2021

DaveCTurner Oct 18, 2021

DaveCTurner left a comment

fcofdez commented Oct 18, 2021

fcofdez commented Oct 18, 2021

elasticsearchmachine commented Oct 18, 2021

jtibshirani commented Oct 18, 2021

	"[%s] should have the same value as [%s]/[%s]",
	"Ensure snapshot files can be used during recovery by setting [%s] to be no greater than [%d]",

	@Nullable
	@Nullable // if we're not downloading files from snapshots in this recovery

Limit concurrent snapshot file restores in recovery per node #79316

Limit concurrent snapshot file restores in recovery per node #79316

Conversation

fcofdez commented Oct 17, 2021 • edited Loading

elasticmachine commented Oct 18, 2021

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcofdez commented Oct 18, 2021

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

fcofdez commented Oct 18, 2021

fcofdez commented Oct 18, 2021

elasticsearchmachine commented Oct 18, 2021

💔 Backport failed

jtibshirani commented Oct 18, 2021

fcofdez commented Oct 17, 2021 •

edited

Loading