Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move index sealing terminology to synced flush #11336

Merged
merged 3 commits into from
May 29, 2015

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented May 25, 2015

#10032 introduced the notion of sealing an index by marking it with a special read only marker, allowing for a couple of optimization to happen. The most important one was to speed up recoveries of shards where we know nothing has changed since they were online by skipping the file based sync phase. During the implementation we came up with a light notion which achieves the same recovery benefits but without the read only aspects which we dubbed synced flush. The fact that it was light weight and didn't put the index in read only mode, allowed us to do it automatically in the background which has great advantage. However we also felt the need to allow users to manually trigger this operation.

The implementation at #11179 added the sync flush internal logic and the manual (rest) rest API. The name of the API was modeled after the sealing terminology which may end up being confusing. This commit changes the API name to match the internal synced flush naming, namely `{index}/_flush/synced'.

On top of that it contains a couple other changes:

  • Remove all java client API. This feature is not supposed to be called programtically by applications but rather by admins.
  • Improve rest responses making structure similar to other (flush) API
  • Change IndexShard#getOperationsCount to exclude the internal +1 on open shard . it's confusing to get 1 while there are actually no ongoing operations
  • Some minor other clean ups

Closes #11251

elastic#10032 introduced the notion of sealing an index by marking it with a special read only marker, allowing for a couple of optimization to happen. The most important one was to speed up recoveries of shards where we know nothing has changed since they were online by skipping the file based sync phase. During the implementation we came up with a light notion which achieves the same recovery benefits but without the read only aspects which we dubbed synced flush. The fact that it was light weight and didn't put the index in read only mode, allowed us to do it automatically in the background which has great advantage. However we also felt the need to allow users to manually trigger this operation.

 The implementation at elastic#11179 added the sync flush internal logic and the manual (rest) rest API. The name of the API was modeled after the sealing terminology which may end up being confusing. This commit changes the API name to match the internal synced flush naming, namely `{index}/_flush/synced'.

  On top of that it contains a couple other changes:
   - Remove all java client API. This feature is not supposed to be called programtically by applications but rather by admins.
   - Improve rest responses making structure similar to other (flush) API
   - Change IndexShard#getOperationsCount to exclude the internal +1 on open shard . it's confusing to get 1 while there are actually no ongoing operations
   - Some minor other clean ups
@bleskes
Copy link
Contributor Author

bleskes commented May 25, 2015

@brwe @clintongormley can you please review?

1. Synced flush is a best effort operation. Any ongoing indexing operations will cause
the synced flush to fail. This means that some shards may be synced flushed while others aren't. See below for more.
2. The `sync_id` marker is removed as soon as the shard is flushed again. Uncommitted
operations in the transaction log do not remove the marker. That is because the marker is store as part

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the marker is storeD

@clintongormley
Copy link

Docs look great! (two tiny changes)

[float]
=== Synced Flush API

The Synced Flush API allows an administrator to initiate a synced flush manually. This can particularly useful for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can BE particularly useful for

@nik9000
Copy link
Member

nik9000 commented May 26, 2015

I don't mind the "seal" name - I just stopped thinking of it as a hermetic seal and started thinking of it as a wax seal on an envelope. You break it when you stuff more documents in it. I forget the other half of the metaphor about having to break the seal to read the documents.

builder.endObject();
return new BytesRestResponse(response.status(), builder);
return new BytesRestResponse(RestStatus.OK, builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now but we need to figure out what to return (#11251)

@nik9000
Copy link
Member

nik9000 commented May 26, 2015

I don't mind the "seal" name - I just stopped thinking of it as a hermetic seal and started thinking of it as a wax seal on an envelope. You break it when you stuff more documents in it. I forget the other half of the metaphor about having to break the seal to read the documents.

I take it back - after reviewing the documentation this way makes more sense to me. No fun metaphor though.

@bleskes bleskes mentioned this pull request May 26, 2015
@bleskes
Copy link
Contributor Author

bleskes commented May 27, 2015

@brwe @clintongormley @nik9000 thx for all the feedback. I pushed a new commit. Also assumed will end up with a 409 for #11251 ..

[[indices-synced-flush]]
=== Synced Flush

Elasticsearch tracks the indexing activity of each shards. Shards that have not

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of each shards^H

GET /twitter/_stats/commit?level=shards
--------------------------------------------------
// AUTOSENSE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an example of the output?

@clintongormley
Copy link

Minor doc comments, but looking good!

@brwe
Copy link
Contributor

brwe commented May 27, 2015

this is tagged for 2.0 but should it not also go in 1.6?

@nik9000
Copy link
Member

nik9000 commented May 27, 2015

this is tagged for 2.0 but should it not also go in 1.6?

Yeah, I was just about to ask that.

@bleskes
Copy link
Contributor Author

bleskes commented May 27, 2015

pushed another update. I was planning the back port PR (which is likely to happen) with 1.6., but can mark this one as well...

@bleskes bleskes added the v1.6.0 label May 27, 2015
@brwe
Copy link
Contributor

brwe commented May 28, 2015

LGTM but not sure if that counts as a go

@@ -140,26 +147,7 @@ public void testSyncedFlush() throws ExecutionException, InterruptedException, I
}

@TestLogging("indices:TRACE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the trace here

@s1monw
Copy link
Contributor

s1monw commented May 28, 2015

LGTM too

@brwe brwe merged commit 37bdbe0 into elastic:master May 29, 2015
@kevinkluge kevinkluge removed the review label May 29, 2015
brwe added a commit that referenced this pull request May 29, 2015
brwe added a commit to brwe/elasticsearch that referenced this pull request May 29, 2015
@clintongormley clintongormley added >feature release highlight :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed >non-issue labels May 29, 2015
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jun 3, 2015
To better distribute the memory allocating to indexing, the IndexingMemoryController periodically checks the different shard for their last indexing activity. If no activity has happened for a while, the controller marks the shards as in active and allocated it's memory buffer budget (but a small minimal budget)  to other active shards.  The recently added synced flush feature (elastic#11179, elastic#11336) uses this inactivity trigger to attempt as  a trigger to attempt adding a sync id marker (which will speed up future recoveries).

We wait for 30m before declaring a shard inactive. However, these days the operation just requires a refresh and is light. We can be stricter (and 5m) increase the chance a synced flush will be triggered.
bleskes added a commit that referenced this pull request Jun 3, 2015
To better distribute the memory allocating to indexing, the IndexingMemoryController periodically checks the different shard for their last indexing activity. If no activity has happened for a while, the controller marks the shards as in active and allocated it's memory buffer budget (but a small minimal budget)  to other active shards.  The recently added synced flush feature (#11179, #11336) uses this inactivity trigger to attempt as  a trigger to attempt adding a sync id marker (which will speed up future recoveries).

We wait for 30m before declaring a shard inactive. However, these days the operation just requires a refresh and is light. We can be stricter (and 5m) increase the chance a synced flush will be triggered.

Closes #11479
bleskes added a commit that referenced this pull request Jun 3, 2015
To better distribute the memory allocating to indexing, the IndexingMemoryController periodically checks the different shard for their last indexing activity. If no activity has happened for a while, the controller marks the shards as in active and allocated it's memory buffer budget (but a small minimal budget)  to other active shards.  The recently added synced flush feature (#11179, #11336) uses this inactivity trigger to attempt as  a trigger to attempt adding a sync id marker (which will speed up future recoveries).

We wait for 30m before declaring a shard inactive. However, these days the operation just requires a refresh and is light. We can be stricter (and 5m) increase the chance a synced flush will be triggered.

Closes #11479
bleskes added a commit that referenced this pull request Jun 3, 2015
To better distribute the memory allocating to indexing, the IndexingMemoryController periodically checks the different shard for their last indexing activity. If no activity has happened for a while, the controller marks the shards as in active and allocated it's memory buffer budget (but a small minimal budget) to other active shards. The recently added synced flush feature (#11179, #11336) uses this inactivity trigger to attempt as a trigger to attempt adding a sync id marker (which will speed up future recoveries).

We wait for 30m before declaring a shard inactive. However, these days the operation just requires a refresh and is light. We can be stricter (and 5m) increase the chance a synced flush will be triggered.

Closes #11479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >feature release highlight v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Response codes for index sealing
6 participants