Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] testRecoveryAfterPrimaryPromotion failed #28209

Closed
dnhatn opened this issue Jan 13, 2018 · 0 comments
Closed

[CI] testRecoveryAfterPrimaryPromotion failed #28209

dnhatn opened this issue Jan 13, 2018 · 0 comments
Assignees
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI

Comments

@dnhatn
Copy link
Member

dnhatn commented Jan 13, 2018

The testRecoveryAfterPrimaryPromotion expected a file-based sync happen but a sequence number based occurred instead. This test started failing since the replica-rollback (#28181) was merged.

Somce instances (not reproduce locally)

REPRODUCE WITH: ./gradlew :server:test \
  -Dtests.seed=1C1099CE2A03DA5C \
  -Dtests.class=org.elasticsearch.index.replication.RecoveryDuringReplicationTests \
  -Dtests.method="testRecoveryAfterPrimaryPromotion" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-BO \
  -Dtests.timezone=Pacific/Port_Moresby

Log: testRecoveryAfterPrimaryPromotion-1.txt

REPRODUCE WITH: ./gradlew :server:test \
  -Dtests.seed=3F415A2110368AA7 \
  -Dtests.class=org.elasticsearch.index.replication.RecoveryDuringReplicationTests \
  -Dtests.method="testRecoveryAfterPrimaryPromotion" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-BO \
  -Dtests.timezone=Antarctica/Mawson
@dnhatn dnhatn added >test-failure Triaged test failures from CI :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jan 13, 2018
@dnhatn dnhatn self-assigned this Jan 13, 2018
@dnhatn dnhatn changed the title [CI] RecoveryDuringReplicationTests#testRecoveryAfterPrimaryPromotion [CI] testRecoveryAfterPrimaryPromotion failed Jan 13, 2018
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Jan 13, 2018
dnhatn added a commit that referenced this issue Jan 13, 2018
@dnhatn dnhatn closed this as completed in fbb840b Jan 14, 2018
dnhatn added a commit that referenced this issue Jan 14, 2018
As a replica always keeps a safe commit and starts peer-recovery with
that commit; file-based recovery  only happens if new operations are
added to the primary and the required translog is not fully retained. In
the test, we tried to produce this condition by flushing a new commit in
order to trim all translog. However, if the new global checkpoint is not
persisted yet, we will keep two commits and not trim translog. This
commit tightens the file-based condition in the test by waiting for the
global checkpoint persisted properly on the new primary before flushing.

Close #28209
Relates #28181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

1 participant