Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't test corruption detection within CFS checksum #33911

Merged
merged 4 commits into from
Sep 22, 2018

Conversation

vladimirdolzhenko
Copy link
Contributor

@vladimirdolzhenko vladimirdolzhenko commented Sep 20, 2018

It is known that Lucene does not check the tail of CFS file (CompoundFileS, like an archive) - see note at https://github.com/apache/lucene-solr/blob/60e8ee717d9cd8d2425c100c21168d6a43b78516/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50CompoundReader.java#L82-L85

Therefore if we'd like Lucene checkIndex to detect corruption we should not corrupt checksum (last 4 bytes) of cfs file.

Closes #33881

@vladimirdolzhenko vladimirdolzhenko added :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v7.0.0 v6.5.0 labels Sep 20, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@vladimirdolzhenko
Copy link
Contributor Author

With

./gradlew :server:test -Dtests.seed=714504A126B124F7 -Dtests.class=org.elasticsearch.index.shard.IndexShardTests -Dtests.method="testIndexCheckOnStartup"

and manual position adjustment:

CheckIndex does not detect corruption on flipping at position 2517..2520 on file: _0.cfs, length: 2521

CheckIndex does detect corruption on flipping at position 2516 file: _0.cfs, length: 2521

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Could we have a test that fails if this issue is fixed in Lucene? Otherwise it could be fixed without anyone noticing and then we'd have a gap in our coverage.

I've run a few thousand iterations of these tests with this fix in place without finding an issue, but it also seems tricky to reproduce without the fix, so I'm keeping it running for a while.


if (fileToCorrupt.getFileName().toString().endsWith(".cfs") && maxPosition > Integer.BYTES) {
// TODO: it is known that Lucene does not check the tail of CFS file (CompoundFileS, like an archive), see note at
// https://github.com/apache/lucene-solr/blob/branch_7_5/lucene/core/src/java/org/apache/lucene/codecs/lucene50/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this link into the following so that it doesn't rot as the code changes:

https://github.com/apache/lucene-solr/blob/60e8ee717d9cd8d2425c100c21168d6a43b78516/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50CompoundReader.java#L82-L85

In fact it might be better to just put this link in the body of the first post of this PR, and make the code comment be a link to this PR, as it'll be much shorter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated link in PR description (it was pointed to a single line)

the problem with the link that it violates checkstyle 140 chars max per line, will add link to this PR as it way shorter 💯

// https://github.com/apache/lucene-solr/blob/branch_7_5/lucene/core/src/java/org/apache/lucene/codecs/lucene50/
// Lucene50CompoundReader.java#L82
// so far, don't corrupt checksum (last 4 bytes) of cfs file
maxPosition -= Integer.BYTES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing: the checksum is actually the last 8 bytes of the file, but the first 4 of those should be 0 (and we verify that). Could (a) the comment be more detailed and (b) the Integer.BYTES just say 4 to avoid this confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@DaveCTurner DaveCTurner added the >test Issues or PRs that are addressing/adding tests label Sep 21, 2018
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. NB PR builds are not currently running so please wait until they're back up (and this PR has passed) before merging.

@DaveCTurner DaveCTurner changed the title don't corrupt checksum of cfs file Don't test corruption detection within CFS checksum Sep 21, 2018
@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 21, 2018

The magic words are "test this please" :) (but not magic enough to fix PR builds it seems)

@vladimirdolzhenko
Copy link
Contributor Author

@elasticmachine test this please

@vladimirdolzhenko vladimirdolzhenko merged commit 477391d into elastic:master Sep 22, 2018
vladimirdolzhenko added a commit that referenced this pull request Sep 22, 2018
@vladimirdolzhenko vladimirdolzhenko deleted the fix/33881 branch September 22, 2018 08:22
@vladimirdolzhenko
Copy link
Contributor Author

thanks @DaveCTurner for the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >test Issues or PRs that are addressing/adding tests v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants