Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce global checkpoint background sync #26591

Merged
merged 42 commits into from
Sep 21, 2017

Commits on Sep 11, 2017

  1. Introduce global checkpoint background sync

    It is the exciting return of the global checkpoint background
    sync. Long, long ago, in snapshot version far, far away we had and only
    had a global checkpoint background sync. This sync would fire
    periodically and send the global checkpoint from the primary shard to
    the replicas so that they could update their local knowledge of the
    global checkpoint. Later in time, as we sped ahead towards finalizing
    the initial version of sequence IDs, we realized that we need the global
    checkpoint updates to be inline. This means that on a replication
    operation, the primary shard would piggy back the global checkpoint with
    the replication operation to the replicas. The replicas would update
    their local knowledge of the global checkpoint and reply with their
    local checkpoint. However, this could allow the global checkpoint on the
    primary to advance again and the replicas would fall behind in their
    local knowledge of the global checkpoint. If another replication
    operation never fired, then the replicas would be permanently behind. To
    account for this, we added one more sync that would fire when the
    primary shard fell idle. However, this has problems:
     - the shard idle timer defaults to five minutes, a long time to wait
       for the replicas to learn of the new global checkpoint
     - if a replica missed the sync, there was no follow-up sync to catch
       them up
     - there is an inherent race condition where the primary shard could
       fall idle mid-operation (after having sent the replication request to
       the replicas); in this case, there would never be a background sync
       after the operation completes
     - tying the global checkpoint sync to the idle timer was never natural
    
    To fix this, we add back a global checkpoint background sync that fires
    on a timer. This timer fires every thirty seconds, and is not
    configurable (for simplicity). This background sync is smarter in the
    sense that it only sends a sync if the global checkpoint on at least one
    replica is lagging that of the primary. This necessitates adding the
    primary shard tracking its knowledge of the local knowledge of the
    global checkpoint on the replicas. When the timer fires, we can compare
    the global checkpoint on the primary to its knowledge of the global
    checkpoint on the replicas and only send a sync if there is a shard
    behind. During replication operations it can be the case that the timer
    fires and sends a sync that would be covered by an in-flight
    operation. This is okay, the extra sync does not hurt and we do not need
    the complexity of optimizing away this duplicate sync.
    jasontedor committed Sep 11, 2017
    Configuration menu
    Copy the full SHA
    c0e7443 View commit details
    Browse the repository at this point in the history
  2. Cleanup after test

    jasontedor committed Sep 11, 2017
    Configuration menu
    Copy the full SHA
    511f96e View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2017

  1. Recollect stats

    jasontedor committed Sep 12, 2017
    Configuration menu
    Copy the full SHA
    e1dc814 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2017

  1. Merge branch 'master' into global-checkpoint-sync

    * master: (67 commits)
      Restoring from snapshot should force generation of a new history uuid (elastic#26694)
      test: Use a single primary shard so that the exception can caught in the same way
      Move pre-6.0 node checkpoint to SequenceNumbers
      Invalid JSON request body caused endless loop (elastic#26680)
      added comment
      fix line length violation
      Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions.
      inner hits: Do not allow inner hits that use _source and have a non nested object field as parent
      Separate Painless Whitelist Loading from the Painless Definition (elastic#26540)
      convert more admin requests to writeable (elastic#26566)
      Handle release of 5.6.1
      Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692)
      Update global checkpoint with permit after recovery
      Filter pre-6.0 nodes for checkpoint invariants
      Skip bad request REST test on pre-6.0
      Reenable BWC tests after disabling for backport
      Add global checkpoint tracking on the primary
      [Test] Fix reference/cat/allocation/line_8 test failure
      [Docs] improved description for fs.total.available_in_bytes (elastic#26657)
      Fix discovery-file plugin to use custom config path
      ...
    jasontedor committed Sep 19, 2017
    Configuration menu
    Copy the full SHA
    86ddf79 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2017

  1. Add after-op sync

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    e0657a7 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into global-checkpoint-sync

    * master:
      Remove assertion from checkpoint tracker invariants
      Upgrade API: fix excessive logging and unnecessary template updates (elastic#26698)
    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    b79b13f View commit details
    Browse the repository at this point in the history
  3. Iteration

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    f7a76cd View commit details
    Browse the repository at this point in the history
  4. Iteration

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    3881d76 View commit details
    Browse the repository at this point in the history
  5. Remove comments

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    89bbf84 View commit details
    Browse the repository at this point in the history
  6. Close it

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    1082a31 View commit details
    Browse the repository at this point in the history
  7. Merge branch 'master' into global-checkpoint-sync

    * master:
      [DOCS] Added index-shared4 and index-shared5.asciidoc
      BulkProcessor flush runnable preserves the thread context from creation time (elastic#26718)
      Catch exceptions and inform handler in RemoteClusterConnection#collectNodes (elastic#26725)
      [Docs] Fix name of character filter in example. (elastic#26724)
      Remove parse field deprecations in query builders (elastic#26711)
      elastic#26720: Set the correct bwc version after backport to 6.0
      Remove deprecated type and slop field in MatchQueryBuilder (elastic#26720)
      Refactoring of Gateway*** classes (elastic#26706)
      Make RestHighLevelClient's Request class public (elastic#26627)
      Deguice ActionFilter (elastic#26691)
      aggs: Allow aggregation sorting via nested aggregation.
      Build: Set bwc builds to always set snapshot (elastic#26704)
      File Discovery: Remove fallback with zen discovery (elastic#26667)
    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    15453b4 View commit details
    Browse the repository at this point in the history
  8. refresh needed

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    aa0c62c View commit details
    Browse the repository at this point in the history
  9. remove ensure green

    jasontedor committed Sep 20, 2017
    Configuration menu
    Copy the full SHA
    721f725 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    9bc5155 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2017

  1. Revert "remove background sync test setting"

    This reverts commit 9bc5155.
    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    63e9d80 View commit details
    Browse the repository at this point in the history
  2. test iteratoin

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    6ade720 View commit details
    Browse the repository at this point in the history
  3. disable bwc tests

    They can not work right not until this is backported; a primary running
    6.x code without this patch will not be sending the global checkpoint
    sync yet.
    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    354df1c View commit details
    Browse the repository at this point in the history
  4. imports

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    79742ea View commit details
    Browse the repository at this point in the history
  5. Formatting of method

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    d9fc19d View commit details
    Browse the repository at this point in the history
  6. Remove leftover code

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    afb082f View commit details
    Browse the repository at this point in the history
  7. State handling

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    9bf875e View commit details
    Browse the repository at this point in the history
  8. Logging on failed sync

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    007d5c4 View commit details
    Browse the repository at this point in the history
  9. no fallthrough

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    cb9373b View commit details
    Browse the repository at this point in the history
  10. Setting

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    7e6d1bf View commit details
    Browse the repository at this point in the history
  11. Critical fix

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    f7295ac View commit details
    Browse the repository at this point in the history
  12. fix comment

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    f237d88 View commit details
    Browse the repository at this point in the history
  13. Revert move, add javadocs

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    e88b92c View commit details
    Browse the repository at this point in the history
  14. Handle execute future

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    030156d View commit details
    Browse the repository at this point in the history
  15. Only check in-sync shards

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    0343372 View commit details
    Browse the repository at this point in the history
  16. Test iteration

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    cf4e67b View commit details
    Browse the repository at this point in the history
  17. More testing

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    f82df30 View commit details
    Browse the repository at this point in the history
  18. refactor shard creation

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    507806b View commit details
    Browse the repository at this point in the history
  19. handle closed

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    530addd View commit details
    Browse the repository at this point in the history
  20. assertSeqNos

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    f3b04dc View commit details
    Browse the repository at this point in the history
  21. Add distruption test

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    5030a77 View commit details
    Browse the repository at this point in the history
  22. revert formatting changes

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    b640b10 View commit details
    Browse the repository at this point in the history
  23. checkstyle

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    cccdec6 View commit details
    Browse the repository at this point in the history
  24. imports

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    d43f794 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    26e4c76 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    b80b728 View commit details
    Browse the repository at this point in the history
  27. revert formatting changes

    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    b8adcce View commit details
    Browse the repository at this point in the history
  28. Merge branch 'master' into global-checkpoint-sync

    * master:
      Add permission checks before reading from HDFS stream (elastic#26716)
      muted test
      [Docs] Fixed typo of *configuration* (elastic#25058)
      Add azure storage endpoint suffix elastic#26432 (elastic#26568)
    jasontedor committed Sep 21, 2017
    Configuration menu
    Copy the full SHA
    c041ea2 View commit details
    Browse the repository at this point in the history