Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sequence numbers to identify out of order delivery in replicas & recovery #24060

Merged
merged 7 commits into from
Apr 14, 2017

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Apr 12, 2017

Internal indexing requests in Elasticsearch may be processed out of order and repeatedly. This is important during recovery and due to concurrency in replicating requests between primary and replicas. As such, a replica/recovering shard needs to be able to identify that an incoming request contains information that is old and thus need not be processed. The current logic is based on external version. This is sadly not sufficient. This PR moves the logic to rely on sequences numbers and primary terms which give the semantics we need.

Relates to #10708

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I left some minors that I would love to be addressed but no need for another round on my end.

import static org.elasticsearch.common.lucene.uid.Versions.NOT_FOUND;

/** Utility class to resolve the Lucene doc ID, version, seqNo and primaryTerms for a given uid. */
public class VersionsAndSeqNoResolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this class final? and does it need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

* </ul>
*/
public static DocIdAndVersion loadDocIdAndVersion(IndexReader reader, Term term) throws IOException {
assert term.field().equals(UidFieldMapper.NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh please give us a message here what field it is... it will save some CPU cycles if shit hits the fan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good one.

* </ul>
*/
public static DocIdAndSeqNo loadDocIdAndSeqNo(IndexReader reader, Term term) throws IOException {
assert term.field().equals(UidFieldMapper.NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

if (versionValue != null) {
if (op.seqNo() > versionValue.getSeqNo() ||
(op.seqNo() == versionValue.getSeqNo() && op.primaryTerm() > versionValue.getTerm()))
status = OpVsLuceneDocStatus.OP_NEWER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I find it odd that we have OP_STALE_OR_EQUAL instead of OP_NEWER_OR_EQUAL which is more natural then you have only one state where indexing would be fatal but I can see that this is an optimization to not index the doc again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed. In the case of equality it should be identical and we can skip tokenization etc.

@@ -601,7 +640,14 @@ private IndexingStrategy planIndexingAsNonPrimary(Index index) throws IOExceptio
// unlike the primary, replicas don't really care to about creation status of documents
// this allows to ignore the case where a document was found in the live version maps in
// a delete state and return false for the created flag in favor of code simplicity
final OpVsLuceneDocStatus opVsLucene = compareOpToLuceneDocBasedOnVersions(index);
final OpVsLuceneDocStatus opVsLucene;
if (index.seqNo() != SequenceNumbersService.UNASSIGNED_SEQ_NO) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave a comment here when this unassigned seq no can occur... and when it can go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left an extra comment on the assertion in the else clause:

                // This can happen if the primary is still on an old node and send traffic without seq# or we recover from translog
                // created by an old version.
                assert config().getIndexSettings().getIndexVersionCreated().before(Version.V_6_0_0_alpha1_UNRELEASED) :
                    "index is newly created but op has no sequence numbers. op: " + index;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -32,8 +32,15 @@
/** the version of the document. used for versioned indexed operations and as a BWC layer, where no seq# are set yet */
private final long version;

VersionValue(long version) {
/** the seq number of the operation that last changed the associated uuid */
private final long seqNo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also just use the members no need to getters... I like it when it's struct like for internal things like this. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me. Less fluff is good.

@bleskes
Copy link
Contributor Author

bleskes commented Apr 13, 2017

@s1monw all your feedback is addressed. Thx.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was very easy to read; I'm glad that it was factored out of the first PR for this reason. I left some very minor comments. It LGTM and I don't need to do another review.

@@ -43,15 +46,18 @@
* in more than one document! It will only return the first one it
* finds. */

final class PerThreadIDAndVersionLookup {
final class PerThreadIDAndVersionSeqNoLookup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "and" is in an awkward place now, maybe PerThreadIDVersionAndSeqNoLookup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed and fixed.

@@ -113,4 +121,21 @@ private int getDocID(BytesRef id, Bits liveDocs) throws IOException {
return DocIdSetIterator.NO_MORE_DOCS;
}
}

/** Return null if id is not found. */
public DocIdAndSeqNo lookupSequenceNo(BytesRef id, Bits liveDocs, LeafReaderContext context) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lookupSeqNo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this method can be package private? And maybe the other lookup methods in this class too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on all accounts.

CloseableThreadLocal<PerThreadIDAndVersionSeqNoLookup> other = lookupStates.putIfAbsent(key, ctl);
if (other == null) {
// Our CTL won, we must remove it when the
// core is closed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can put this comment all on one line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was like that before, but we are here now. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

reader.addCoreClosedListener(removeLookupState);
} else {
// Another thread beat us to it: just use
// their CTL:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can put this comment all on one line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was like that before, but we are here now. 😄

}
}

/** returns 0 if the primary term is not found */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add link to IndexMetaData#primaryTerm as it has a note about the primary term on an operational shard always being strictly positive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good one.

@@ -416,6 +417,43 @@ public GetResult get(Get get, Function<String, Searcher> searcherFactory) throws
LUCENE_DOC_NOT_FOUND
}

private OpVsLuceneDocStatus compareOpToLuceneDocBasedOnSeqNo(final Operation op) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so nice to finally see this arriving.

@@ -416,6 +417,43 @@ public GetResult get(Get get, Function<String, Searcher> searcherFactory) throws
LUCENE_DOC_NOT_FOUND
}

private OpVsLuceneDocStatus compareOpToLuceneDocBasedOnSeqNo(final Operation op) throws IOException {
assert op.seqNo() != SequenceNumbersService.UNASSIGNED_SEQ_NO : "resolving ops based seq# but no seqNo is found";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: based -> based on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Thx.

}

/** returns 0 if the primary term is not found */
public long lookUpPrimaryTerm(int docID) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be package private?

public final long version;
public final LeafReaderContext context;

public DocIdAndVersion(int docId, long version, LeafReaderContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this constructor can be package private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

public final long seqNo;
public final LeafReaderContext context;

public DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this constructor can be package private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Punch it.

@bleskes bleskes merged commit ecf8168 into elastic:master Apr 14, 2017
@bleskes bleskes deleted the seq_no_as_version_2 branch April 14, 2017 19:46
@bleskes
Copy link
Contributor Author

bleskes commented Apr 14, 2017

Thx @jasontedor @s1monw

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 15, 2017
* master:
  Use sequence numbers to identify out of order delivery in replicas & recovery (elastic#24060)
  Remove customization of ES_USER and ES_GROUP
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants