-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Fix bug where replica shows stale doc count during engine reset. #9495
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #9495 +/- ##
============================================
- Coverage 71.15% 71.11% -0.04%
+ Complexity 57536 57451 -85
============================================
Files 4781 4781
Lines 271197 271227 +30
Branches 39595 39599 +4
============================================
- Hits 192975 192891 -84
- Misses 62011 62119 +108
- Partials 16211 16217 +6
|
Compatibility status:Checks if related components are compatible with change d1678ba Incompatible componentsIncompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
server/src/test/java/org/opensearch/index/engine/NRTReplicationEngineTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the shard temporarily flips to a ReadOnly engine and initializes its reader from disk.
Is it convenient to add unit test which mimics this state
server/src/test/java/org/opensearch/index/engine/NRTReplicationEngineTests.java
Outdated
Show resolved
Hide resolved
@mch2 I have copied the MR to my local repo, it does solve my problem now. The search result on bumping replica will not produce stale data now. https://github.com/maosuhan/OpenSearch/tree/fix_sr |
added a better test to SegmentReplicationIndexShardTests |
Gradle Check (Jenkins) Run Completed with:
|
This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
pushed a rebase - test failures look unrelated. |
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9495-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 012c4fa4ec8b719cecd4e163c5fdc0e4f42679d3
# Push it to GitHub
git push --set-upstream origin backport/backport-9495-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…ing engine reset. (opensearch-project#9495) * Fix bug where replica shows stale doc count during engine reset. This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add changelog entry. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add unit test for search during engine reset. Signed-off-by: Marc Handalian <handalm@amazon.com> * Remove useless test. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
…ing engine reset. (#9495) (#9595) * Fix bug where replica shows stale doc count during engine reset. This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. * Add changelog entry. * Add unit test for search during engine reset. * Remove useless test. --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
…ing engine reset. (opensearch-project#9495) * Fix bug where replica shows stale doc count during engine reset. This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add changelog entry. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add unit test for search during engine reset. Signed-off-by: Marc Handalian <handalm@amazon.com> * Remove useless test. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
…ing engine reset. (opensearch-project#9495) * Fix bug where replica shows stale doc count during engine reset. This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add changelog entry. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add unit test for search during engine reset. Signed-off-by: Marc Handalian <handalm@amazon.com> * Remove useless test. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
…ing engine reset. (opensearch-project#9495) * Fix bug where replica shows stale doc count during engine reset. This change fixes an issue where replica shards can temporarily return stale results while converting to a RO engine during an engine reset. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add changelog entry. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add unit test for search during engine reset. Signed-off-by: Marc Handalian <handalm@amazon.com> * Remove useless test. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
This change fixes an issue where replica shards can temporarily return stale results during promotion to primary. This happens in
resetEngineToGlobalCheckpoint
because the shard temporarily flips to a ReadOnly engine and initializes its reader from disk. This is possible because NRTReplicationEngine did not previously implement flush and the freshest data is only active on the reader. Fixed by implementing flush and also honoring acquireLatestCommit's flushFirst parameter.Related Issues
related #8985
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.