Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zen2] Implement Tombstone REST APIs #36007

Merged
merged 5 commits into from
Nov 29, 2018

Conversation

original-brownbear
Copy link
Member

  • Adds REST API for withdrawing votes and clearing vote withdrawls
  • Tests added to Netty4 module since we need a real Network impl. for Http endpoints

* Adds REST API for withdrawing votes and clearing vote withdrawls
* Tests added to Netty4 module since we need a real Network impl. for Http endpoints
@original-brownbear original-brownbear added >enhancement v7.0.0 :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and nits.

protected Settings nodeSettings(int nodeOrdinal) {
return Settings.builder().put(super.nodeSettings(nodeOrdinal))
.put(TestZenDiscovery.USE_ZEN2.getKey(), true)
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the internal test cluster throws if this isn't set and you turn off auto manage min master nodes (probably something that could be adjusted for Zen2?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, so it does. Set it to something unreasonable (MAX_VALUE) to protect against this test passing without Zen2.

.put(TestZenDiscovery.USE_ZEN2.getKey(), true)
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), 2)
.put(ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING.getKey(), 2)
.put(DiscoverySettings.INITIAL_STATE_TIMEOUT_SETTING.getKey(), "5s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, the cluster should form straight away.

}

public void testAddAndClearVotingTombstones() throws Exception {
final int nodeCount = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you inline nodeCount?

return false; // enable http
}

public void testAddAndClearVotingTombstones() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think testRollingRestartOfTwoNodeCluster would be a good name.

.setWaitForNodes(Integer.toString(nodeCount - 1))
.setTimeout(TimeValue.timeValueSeconds(30L));

clusterHealthRequestBuilder.setWaitForYellowStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be in the chain of .set() methods above, and the temporary clusterHealthRequestBuilder can probably be inlined.

Response deleteResponse = restClient.performRequest(new Request("DELETE", "/_cluster/withdrawn_votes"));
assertThat(deleteResponse.getStatusLine().getStatusCode(), is(200));
assertThat(deleteResponse.getEntity().getContentLength(), is(0L));
Response response =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the
newline? :)

public void testBasicRestApi() throws Exception {
List<String> nodes = internalCluster().startNodes(3);
RestClient restClient = getRestClient();
Response deleteResponse = restClient.performRequest(new Request("DELETE", "/_cluster/withdrawn_votes"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it'd make more sense to put this after the POST. There will, at some point, be an assertion that there's no voting tombstones in the cluster at the end of the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Just tried making that change but running into this error as a result:

[2018-11-29T00:04:21,996][WARN ][r.suppressed             ] [node_t0] path: /_cluster/withdrawn_votes, params: {}
org.elasticsearch.transport.RemoteTransportException: [node_t1][127.0.0.1:33009][cluster:admin/voting/clear_tombstones]
Caused by: org.elasticsearch.ElasticsearchTimeoutException: timed out waiting for removal of nodes; if nodes should not be removed, set waitForRemoval to false. [{node_t2}{ps-Qi3TfSAOMEW6D-A5uAA}]
	at org.elasticsearch.action.admin.cluster.configuration.TransportClearVotingTombstonesAction$1.onTimeout(TransportClearVotingTombstonesAction.java:109) ~[main/:?]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) ~[main/:?]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) ~[main/:?]
	at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:561) ~[main/:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:627) ~[main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

(no matter what node I add a Tombstone for, this happens)

if nodes should not be removed, waitForRemoval to false

Should I do that in the Rest API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep :)

@ywelsch ywelsch mentioned this pull request Nov 28, 2018
61 tasks
@original-brownbear
Copy link
Member Author

All but 2 comments addressed :) 2 Questions added.

@original-brownbear
Copy link
Member Author

@DaveCTurner alright added the "don't wait" parameter and set the zen1 master count to Integer.MAX_VALUE => should be good for review again :)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review was unclear - I intended exposing the parameter, not hard-coding its value.

@Override
protected RestChannelConsumer prepareRequest(final RestRequest request, final NodeClient client) throws IOException {
ClearVotingTombstonesRequest req = new ClearVotingTombstonesRequest();
req.setWaitForRemoval(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I meant this should be a parameter ?waitForRemoval=false - both options are useful in different circumstances. The default of true is the usual case, but in this particular test we should set it to false because the node is still present.

Now that I've written that, I think it'd be good to test both cases:

  1. create a 3-node cluster, add a tombstone, then clear them (?waitForRemoval=false) <- today's test
  2. create a 3-node cluster, add a tombstone, shut the corresponding node down, then clear the tombstones (?waitForRemoval=true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other APIs, it should be wait_for_removal not waitForRemoval.


import static org.hamcrest.core.Is.is;

// TODO: Move these tests to a more appropriate module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

// These tests are here today so they have access to a proper REST client. They cannot be in :server:integTest since the REST client needs a
// proper transport implementation, and they cannot be REST tests today since they need to restart nodes. When #35599 and friends land we
// should be able to move these tests to run against a proper cluster instead. TODO do this.

@original-brownbear
Copy link
Member Author

@DaveCTurner all done :)

  • exposed wait param
    • Added tests for both cases
  • reworded todo

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Jenkins test this

@original-brownbear original-brownbear merged commit 48dc6c3 into elastic:zen2 Nov 29, 2018
@original-brownbear original-brownbear deleted the zen2-rest-apis branch November 29, 2018 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants