Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check and repair index under the store metadata lock #27768

Merged
merged 9 commits into from
Dec 20, 2017

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 12, 2017

Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method snapshotStoreMetadata. This commit makes sure to create a CheckIndex under the metadata lock.

Closes #24481
Closes #27731
Relates #24787

Today when we get a metadata snapshot directly from a store directory,
we acquire a metadata lock, then acquire an IW lock. However, we create
a CheckIndex in IndexShard without acquiring the metadata lock first.
This causes a recovery failed because the IW lock can be still held by
`snapshotStoreMetadata`. This commit makes sure to create a CheckIndex
under the metadata lock.

Closes elastic#24481
Relates elastic#24787
@dnhatn dnhatn added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >bug review v6.2.0 v7.0.0 labels Dec 12, 2017
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment LGTM otherwise

if ("fix".equals(checkIndexOnStartup)) {
if (logger.isDebugEnabled()) {
logger.debug("fixing index, writing new segments file ...");
store.runUnderMetadataLock(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding this runUnderMetadataLock method would it make sense to just move the checkindex method into store like this public CheckIndex.Status checkIndex() then we can interpret the result here and run under the appropriate locks.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 14, 2017

@s1monw I've moved checkIndex and exorciseIndex methods the Store class. Could you please take another look? Thank you.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 14, 2017

@elasticmachine please retest this.

@dnhatn dnhatn changed the title Check index under the store metadata lock Repair index under the store metadata lock Dec 16, 2017
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment. LGTM otherwise

*/
public CheckIndex.Status checkIndex(PrintStream out) throws IOException {
// We don't need to lock the directory here as we are not changing the index files.
final Lock noDirectoryLock = new Lock() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should still acquire the lock. I don't think we should execute this while the IW is open?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed f231042

@dnhatn
Copy link
Member Author

dnhatn commented Dec 18, 2017

please retest this.

@dnhatn dnhatn changed the title Repair index under the store metadata lock Check and repare index under the store metadata lock Dec 19, 2017
@dnhatn dnhatn changed the title Check and repare index under the store metadata lock Check and repair index under the store metadata lock Dec 19, 2017
@dnhatn
Copy link
Member Author

dnhatn commented Dec 20, 2017

@s1monw, I've moved the checkIndex under the metadata lock; would you please take a look? Thank you.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM is this also fixing #27731 ?

@dnhatn
Copy link
Member Author

dnhatn commented Dec 20, 2017

@s1monw, I've removed the direct usages of CheckIndex in MockFSDirectoryService and in another test. This should also fix #27731.

@dnhatn dnhatn merged commit 54b6885 into elastic:master Dec 20, 2017
@dnhatn
Copy link
Member Author

dnhatn commented Dec 20, 2017

Thanks @s1monw for reviewing.

@dnhatn dnhatn deleted the lock-checkindex branch December 20, 2017 16:26
dnhatn added a commit that referenced this pull request Dec 20, 2017
Today when we get a metadata snapshot directly from a store directory,
we acquire a metadata lock, then acquire an IndexWriter lock. However,
we create a CheckIndex in IndexShard without acquiring the metadata lock
first. This causes a recovery failed because the IndexWriter lock can be
still held by method snapshotStoreMetadata. This commit makes sure to
create a CheckIndex under the metadata lock.

Closes #24481
Closes #27731
Relates #24787
martijnvg added a commit that referenced this pull request Dec 21, 2017
* es/master: (45 commits)
  Adapt scroll rest test after backport. relates #27842
  Move early termination based on index sort to TopDocs collector (#27666)
  Upgrade beats templates that we use for bwc testing. (#27929)
  ingest: upgraded ingest geoip's geoip2's dependencies.
  [TEST] logging for update by query test #27820
  Add elasticsearch-nio jar for base nio classes (#27801)
  Use full profile on JDK 10 builds
  Require Gradle 4.3
  Enable grok processor to support long, double and boolean (#27896)
  Add unreleased v6.1.2 version
  TEST: reduce blob size #testExecuteMultipartUpload
  Check index under the store metadata lock (#27768)
  Fixes DocStats to not report index size < -1 (#27863)
  Fixed test to be up to date with the new database files.
  Upgrade to Lucene 7.2.0. (#27910)
  Disable TestZenDiscovery in cloud providers integrations test
  Use `_refresh` to shrink the version map on inactivity (#27918)
  Make KeyedLock reentrant (#27920)
  ingest: Upgraded the geolite2 databases.
  [Test] Fix IndicesClientDocumentationIT (#27899)
  ...
martijnvg added a commit that referenced this pull request Dec 21, 2017
* es/6.x: (43 commits)
  ingest: upgraded ingest geoip's geoip2's dependencies.
  [TEST] logging for update by query test #27820
  Use full profile on JDK 10 builds
  Require Gradle 4.3
  Add unreleased v6.1.2 version
  TEST: reduce blob size #testExecuteMultipartUpload
  Check index under the store metadata lock (#27768)
  Upgrade to Lucene 7.2.0. (#27910)
  Fixed test to be up to date with the new database files.
  Use `_refresh` to shrink the version map on inactivity (#27918)
  Make KeyedLock reentrant (#27920)
  Fixes DocStats to not report index size < -1 (#27863)
  Disable TestZenDiscovery in cloud providers integrations test
  ingest: Upgraded the geolite2 databases.
  [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (#27717)
  [Test] Fix IndicesClientDocumentationIT (#27899)
  Move uid lock into LiveVersionMap (#27905)
  Mute testRetentionPolicyChangeDuringRecovery
  Increase Gradle heap space to 1536m
  Move GlobalCheckpointTracker and remove SequenceNumbersService (#27837)
  ...
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
@jpountz jpountz removed the :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. label Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants