Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot: Migrate TransportRequestHandler to TransportMasterNodeAction #27165

Merged
merged 10 commits into from
Nov 17, 2017

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 30, 2017

Currently, we are using a plain TransportRequestHandler to post snapshot status messages to the master. However, it doesn't have a robust retry mechanism as TransportMasterNodeAction. This change migrates from TransportRequestHandler to TransportMasterNodeAction for the new versions and keeps the current implementation for the old versions (BWC).

Closes #27151

Currently, we are using a plain TransportRequestHandler to post snapshot
status messages to the master. However, it doesn't have a robust retry
mechanism as TransportMasterNodeAction. This changes migrate from
TransportRequestHandler to TransportMasterNodeAction. Most of code in
TransportSnapshotUpdateStatusAction is copied from
SnapshotShardsService.

Serializing a MasterNodeRequest requires 8 bytes more than a
TransportRequest. In order to maintain the BWC in a mixed cluster, we
have to serialize/deserialize a MasterNodeRequest as a TransportRequest
without timeout.

Closes elastic#27151
@dnhatn dnhatn added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v6.1.0 v7.0.0 labels Oct 30, 2017
Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments/questions. In general, it looks good to me, but we need to figure out and outline our backport strategy more clearly. I feel like a lot of code here will not be actually needed in 7.x, when we will backport this to 6.x.

* This method serializes a {@link MasterNodeRequest} as a {@link org.elasticsearch.transport.TransportRequest}
* without timeout. The master will have to use the default timeout setting.
*/
protected final void readFromAsTransportRequest(StreamInput in) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, but at the same time might be trap-y. This class is used in a lot of places and I think I would rather have 2 different clean implementation in 6.x and one clean implementation in 7.x then this here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, having two handlers may be safer than this approach.

@Override
public void readFrom(StreamInput in) throws IOException {
// To keep BWC, we have to deserialize a MasterNodeRequest from a TransportRequest from older versions.
if (in.getVersion().before(Version.V_7_0_0_alpha1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary solution until we backport his to 6.x branch? If this is the case, could you mark all this places with //TODO: or something like this so it would be obvious what we keep in 7.x and what goes only into 6.x.

// addLowPriorityApplier to make sure that Repository will be created before snapshot
clusterService.addLowPriorityApplier(this);
// this is only useful on the nodes that can hold data.
clusterService.addListener(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here why this is needs to be a listener and not an applier and why it's ok to have this as addListener and not addLast?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test DedicatedClusterSnapshotRestoreIT#testSnapshotWithStuckNode was failed with an Applier but passed with a Listener. However, I don't really know the root cause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imotov, I think I have figured out the root cause. A Listener is called after the new state is set, while an Applier is called before that. In SnapshotShardsService, we call #syncShardStatsOnNewMaster -> #updateIndexShardSnapshotStatus which in turn uses a TransportMasterNodeAction. The TransportMasterNodeAction accesses the state of the cluster directly which may not be available yet if it's invoked from the Applier.

Caused by: java.lang.AssertionError: should not be called by a cluster state applier. reason [the applied cluster state is not yet available]

This explains why #testSnapshotWithStuckNode was failed with an Applier but passed with a Listener.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 6, 2017

@imotov, I have reverted the first commit and added a new commit with 2 separate handlers. Could you please have another look? Thank you.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would like @ywelsch to also take a look from ClusterState perspective.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found an issue with the new action being invoked incorrectly. Also, we should think about a sensible timeout. Finally, I would like to see a test which shows that the retry mechanism works. Please also update PR description.

transportService.sendRequest(master, UPDATE_SNAPSHOT_ACTION_NAME, request, EmptyTransportResponseHandler.INSTANCE_SAME);
if (master.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
UpdateIndexShardSnapshotStatusRequest request = new UpdateIndexShardSnapshotStatusRequest(snapshot, shardId, status);
transportService.sendRequest(master, UPDATE_SNAPSHOT_STATUS_ACTION_NAME, request, EmptyTransportResponseHandler.INSTANCE_SAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this request has a default masterNodeTimeout of 30 seconds. Shouldn't we wait longer (possibly forever?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed cb73eea

transportService.sendRequest(master, UPDATE_SNAPSHOT_ACTION_NAME, request, EmptyTransportResponseHandler.INSTANCE_SAME);
if (master.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
UpdateIndexShardSnapshotStatusRequest request = new UpdateIndexShardSnapshotStatusRequest(snapshot, shardId, status);
transportService.sendRequest(master, UPDATE_SNAPSHOT_STATUS_ACTION_NAME, request, EmptyTransportResponseHandler.INSTANCE_SAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you send the request like this, and the master is unavailable, there will be no retry (i.e. the retry code in TransportMasterNodeAction won't come into play). The action needs to be sent to to the node itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed ea7ec38

public void messageReceived(UpdateIndexShardSnapshotStatusRequest request, final TransportChannel channel) throws Exception {
innerUpdateSnapshotState(request);
public void messageReceived(UpdateSnapshotStatusRequestV6 requestV6, final TransportChannel channel) throws Exception {
final UpdateIndexShardSnapshotStatusRequest request = new UpdateIndexShardSnapshotStatusRequest(requestV6.snapshot(), requestV6.shardId(), requestV6.status());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, masterNodeTimeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dnhatn
Copy link
Member Author

dnhatn commented Nov 14, 2017

@ywelsch, I have added a disruption test and verified that the retry mechanism works. Could you please take another look? Thank you.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the newly added test actually fail without your changes?


logger.info("--> unblocking repository");
unblockNode("test-repo", blockedNode);
Thread.sleep(200);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to avoid Thread.sleep calls in our tests

unblockNode("test-repo", blockedNode);
Thread.sleep(200);
logger.info("--> stop disrupting cluster");
internalCluster().clearDisruptionScheme(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is usually done by calling networkDisruption.stopDisrupting();

@dnhatn
Copy link
Member Author

dnhatn commented Nov 15, 2017

@ywelsch, The newly added test was failed with a plain transport handler, but passed with TransportMasterNodeAction. I pushed 85da45d to address your suggestions. Could you please take another quick look? Thank you.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 16, 2017

@ywelsch, The result of the test is false positive. As we enabled the delayed disruption, the MockTransportService clones an outgoing request using the request handler of that request on a sender node (eg. the data node in the test). However, in SnapshotShardsService, we only register the request handler for master nodes only.

if (DiscoveryNode.isMasterNode(settings)) {
// This needs to run only on nodes that can become masters
transportService.registerRequestHandler(UPDATE_SNAPSHOT_ACTION_NAME, UpdateIndexShardSnapshotStatusRequest::new, ThreadPool.Names.SAME, new UpdateSnapshotStateRequestHandler());
}

This causes the MockTransportService throw NPE when simulating network delay.
RequestHandlerRegistry reg = MockTransportService.this.getRequestHandler(action);
BytesStreamOutput bStream = new BytesStreamOutput();
request.writeTo(bStream);
final TransportRequest clonedRequest = reg.newRequest(bStream.bytes().streamInput());

I registered that handler for all nodes, then the test was passed with both approaches.

I tried with other kinds of disruption but no luck.

  1. NetworkDisruption.NetworkDisconnect: both approaches were either success or failed.

  2. Stop the current master node. This actually showed a difference between a plain transport handler and a master node action if I commented out syncShardStatsOnNewMaster(event)

    public void applyClusterState(ClusterChangedEvent event) {
    try {
    SnapshotsInProgress prev = event.previousState().custom(SnapshotsInProgress.TYPE);
    SnapshotsInProgress curr = event.state().custom(SnapshotsInProgress.TYPE);
    if ((prev == null && curr != null) || (prev != null && prev.equals(curr) == false)) {
    processIndexShardSnapshots(event);
    }
    String masterNodeId = event.state().nodes().getMasterNodeId();
    if (masterNodeId != null && masterNodeId.equals(event.previousState().nodes().getMasterNodeId()) == false) {
    syncShardStatsOnNewMaster(event);
    }

syncShardStatsOnNewMaster(event) actually sends the snapshot status to a new elected master (I think a master node action does more than this). Actually, all existing tests were passed with the new approach (but not with the existing one) when I commented out syncShardStatsOnNewMaster(event).

What do you think? Thank you.

@ywelsch
Copy link
Contributor

ywelsch commented Nov 16, 2017

We probably can't get rid of syncShardStatsOnNewMaster as long as we run in mixed-version clusters with 6.0 nodes. If this is backported to v6.x, we could get rid of it in v7.0.0, however. @imotov WDYT?

@imotov
Copy link
Contributor

imotov commented Nov 16, 2017

@ywelsch, @dnhatn Yes, I think 6.x needs both because 6.x can talk to 6.0 and 6.0 is using the old way. However, 7.0 is only going to talk to 6.last and 6.last is going to have both protocols so we don't need support for both in 7.0. That's should be the final state, I think.

However, to get there we need to figure out how we are going to merge these changes in without breaking backward compatibility tests. One way to do it is to merge both protocols in 7.0, bake it there for a while, then back port it to 6.x, and remove old protocol from 7.0 at the same time.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 16, 2017

@imotov Thank you for your suggestion. Just to confirm the backport steps.

  1. Merge this PR to the 7.0 only
  2. Backport this PR with a new commit updating the version checking to 6.x
  3. Add a clean up commit to 7.0 only. This commit merely removes the BWC code.

@imotov
Copy link
Contributor

imotov commented Nov 17, 2017

Yes, except you can combine 7.x portion of 2) and 3) into a single commit.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Nov 17, 2017

Thanks @imotov and @ywelsch for your helpful reviews.

@dnhatn dnhatn merged commit db688e1 into elastic:master Nov 17, 2017
@dnhatn dnhatn deleted the snapshot-migrate-transport branch November 17, 2017 16:55
dnhatn added a commit that referenced this pull request Nov 17, 2017
Currently, we are using a plain TransportRequestHandler to post snapshot
status messages to the master. However, it doesn't have a robust retry
mechanism as TransportMasterNodeAction. This change migrates from
TransportRequestHandler to TransportMasterNodeAction for the new
versions and keeps the current implementation for the old versions.

Closes #27151
jasontedor added a commit that referenced this pull request Nov 20, 2017
* master: (31 commits)
  [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
  Transition transport apis to use void listeners (#27440)
  AwaitsFix GeoShapeQueryTests#testPointsOnly #27454
  Bump test version after backport
  Ensure nested documents have consistent version and seq_ids (#27455)
  Tests: Add Fedora-27 to packaging tests
  Delete some seemingly unused exceptions (#27439)
  #26800: Fix docs rendering
  Remove config prompting for secrets and text (#27216)
  Move the CLI into its own subproject (#27114)
  Correct usage of "an" to "a" in getting started docs
  Avoid NPE when getting build information
  Removes BWC snapshot status handler used in 6.x (#27443)
  Remove manual tracking of registered channels (#27445)
  Remove parameters on HandshakeResponseHandler (#27444)
  [GEO] fix pointsOnly bug for MULTIPOINT
  Standardize underscore requirements in parameters (#27414)
  peanut butter hamburgers
  Log primary-replica resync failures
  Uses TransportMasterNodeAction to update shard snapshot status (#27165)
  ...
jasontedor added a commit that referenced this pull request Nov 20, 2017
* 6.x: (41 commits)
  [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
  Transition transport apis to use void listeners (#27440)
  AwaitsFix GeoShapeQueryTests#testPointsOnly #27454
  Ensure nested documents have consistent version and seq_ids (#27455)
  Tests: Add Fedora-27 to packaging tests
  #26800: Fix docs rendering
  Move the CLI into its own subproject (#27114)
  Correct usage of "an" to "a" in getting started docs
  Avoid NPE when getting build information
  Remove manual tracking of registered channels (#27445)
  Standardize underscore requirements in parameters (#27414)
  Remove parameters on HandshakeResponseHandler (#27444)
  [GEO] fix pointsOnly bug for MULTIPOINT
  peanut butter hamburgers
  Uses TransportMasterNodeAction to update shard snapshot status (#27165)
  Log primary-replica resync failures
  Add limits for ngram and shingle settings (#27411)
  Enforce a minimum task execution and service time of 1 nanosecond
  Fix place-holder in allocation decider messages (#27436)
  Remove newline from log message (#27425)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v6.1.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants