Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce long polling for changes #33683

Merged
merged 11 commits into from
Sep 16, 2018

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented Sep 13, 2018

Rather than scheduling pings to the leader index when we are caught up to the leader, this commit introduces long polling for changes. We will fire off a request to the leader which if we are already caught up will enter a poll on the leader side to listen for global checkpoint changes. These polls will timeout after a default of one minute, but can also be specified when creating the following task. We use these time outs as a way to keep statistics up to date, to not exaggerate time since last fetches, and to avoid pipes being broken.

Relates #32651

Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
@jasontedor jasontedor added review :Distributed/CCR Issues around the Cross Cluster State Replication features labels Sep 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jasontedor
Copy link
Member Author

@elasticmachine run gradle build tests

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I left two questions about ShardChangesAction.TransportAction#asyncShardOperation(...).

…polling

* elastic/master:
  SQL: Return correct catalog separator in JDBC (elastic#33670)
  [CCR] Add validation for max_retry_delay (elastic#33648)
  [CCR] Add monitoring mapping verification test (elastic#33662)
  CORE: Disable Setting Type Validation (elastic#33660) (elastic#33669)
  Revert "Use serializable exception in GCP listeners (elastic#33657)"
  Adding index refresh (elastic#33647)
  [DOCS] Moves securing-communications to docs (elastic#33640)
  [HLRC][ML] Add ML delete datafeed API to HLRC (elastic#33667)
  Mute testRecoveryWithConcurrentIndexing
  TEST: decrease logging level in the flush test
  DOC: Add SQL section on client applications
  Fix race in global checkpoint listeners test
  Use serializable exception in GCP listeners (elastic#33657)
@jasontedor
Copy link
Member Author

@martijnvg I pushed some commits.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know asyncShardOperation before. This is really nice.

final SeqNoStats seqNoStats = indexShard.seqNoStats();

if (request.getFromSeqNo() > seqNoStats.getGlobalCheckpoint()) {
assert request.getFromSeqNo() == 1 + seqNoStats.getGlobalCheckpoint();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this assertion always holds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnhatn Please see my latest pushes. I have integrated #33690 so that we only wake up if we advanced to the request from sequence number, or timeout.

@dnhatn
Copy link
Member

dnhatn commented Sep 13, 2018

@elasticmachine run gradle build tests

* master: (24 commits)
  Only notify ready global checkpoint listeners (elastic#33690)
  Don't count hits via the collector if the hit count can be computed from index stats. (elastic#33701)
  Expose retries for CCR fetch failures (elastic#33694)
  Test fix - Graph vertices could appear in different orders based on map insertion sequence (elastic#33709)
  Structured audit logging (elastic#31931)
  Core: Add DateFormatter interface for java time parsing (elastic#33467)
  [CCR] Check whether the rejected execution exception has the shutdown flag set (elastic#33703)
  Mute ClusterDisruptionIT#testSendingShardFailure
  Revert "Mute FullClusterRestartSettingsUpgradeIT"
  Adjust BWC version on settings upgrade test (elastic#33650)
  [ML] Allow overrides for some file structure detection decisions (elastic#33630)
  Adapt skip version for doc_values format deprecation
  [TEST] wait for no initializing shards
  [Docs] Minor fix in `has_child` javadoc comment (elastic#33674)
  Mute FullClusterRestartSettingsUpgradeIT
  [Kerberos] Add realm name & UPN to user metadata (elastic#33338)
  [TESTS] Disable specific locales for RestrictedTrustManagerTest (elastic#33299)
  SQL: Return functions in JDBC driver metadata (elastic#33672)
  SCRIPTING: Move terms_set Context to its Own Class (elastic#33602)
  AwaitsFix testRestoreMinmal
  ...
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@elasticdog
Copy link
Contributor

This job triggered CI during a migration of the master. Kicking off an additional build for you manually...

Jenkins, test this please.

@jasontedor
Copy link
Member Author

jasontedor commented Sep 14, 2018

@elasticmachine run gradle build tests

* master:
  Add script to cache dependencies (elastic#33726)
  [DOCS] Moves security reference to docs folder (elastic#33643)
  Cleanup assertions in global checkpoint listeners (elastic#33722)
  [CCR] Move ccr tests in core module back to ccr module (elastic#33711)
  HLRC: ML PUT Calendar (elastic#33362)
  [Tests] Fix randomization in StringTermsIT (elastic#33678)
  Pin TLS1.2 in SSLConfigurationReloaderTests
@jasontedor
Copy link
Member Author

To fix the build failures here, I want to integrate #33731 and then I want to add a check to the pending tasks check in the REST test to skip the shard changes action tasks that can be outstanding from polls. 😇

* master:
  Move CCR REST tests to a sub-project of ccr
  Move CCR REST tests to ccr sub-project (elastic#33731)
  Move CCR monitoring tests to ccr sub-project (elastic#33730)
@jasontedor
Copy link
Member Author

This pull request requires #33738 and then it can be merged.

* master:
  Do not count shard changes tasks against REST tests (elastic#33738)
  [HLRC][ML] Add ML get datafeed API to HLRC (elastic#33715)
@jasontedor jasontedor merged commit 770ad53 into elastic:master Sep 16, 2018
jasontedor added a commit that referenced this pull request Sep 16, 2018
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
@jasontedor jasontedor deleted the global-checkpoint-polling branch September 16, 2018 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants