Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate toXContent logic from ClusterState to its member classes #48218 #48924

Closed
wants to merge 7 commits into from

Conversation

zacharymorn
Copy link
Contributor

@zacharymorn zacharymorn commented Nov 10, 2019

ClusterState.toXContent contains duplicate logic from its member classes such as IndexMetaData.toXContent and MetaData.toXContent etc. This PR is to remove the duplicate and delegate logic to those classes. For details, please refer to #48218

… serialization for IndexMetaData.

This test may sometimes fail due to JsonNode.equals may be order sensitive for deeply nested json structure
…a.toXContent, and note the difference both in code as comment and in serialization.

This test may sometimes fail due to JsonNode.equals may be order sensitive for deeply nested json structure
@cbuescher cbuescher added the :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. label Nov 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

… serialization for IndexTemplateMetaData

This test may sometimes fail due to JsonNode.equals may be order sensitive for deeply nested json structure
…eMetaData.toXContent, and note the difference in serialization.

This test may sometimes fail due to JsonNode.equals may be order sensitive for deeply nested json structure
…Content, and note the difference in serialization.

This test may sometimes fail due to JsonNode.equals may be order sensitive for deeply nested json structure
@gwbrown
Copy link
Contributor

gwbrown commented Dec 11, 2019

Hi @zacharymorn, first off I want to apologize for taking so long to review this. I've been waiting on some internal discussions that took a while to happen. I want to thank you especially for looking into the other places in ClusterState's toXContent that have this issue.

We discussed this recently in the Distributed team sync, and we're going to:

  1. Move the list of other duplicated toXContent logic to a meta-issue so that we can keep track of which ones we've addressed. This probably means we'll want to remove the TODO comments before merging (but leave them for the moment, until we get them transferred to a meta-issue).
  2. Review the changes to the Cluster State JSON response, as we may want to make changes to make it more easily human-readable (as this JSON is mostly used as a toubleshooting tool).
  3. Do some testing with other applications (for example, Kibana) to make sure we aren't going to accidentally break something if we backport this to 7.x. While we document that the format of this API may change in minor versions, we still don't want to do so casually and risk breakage.

We'll also review the code changes in this PR as usual soon.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some initial comments. I would prefer to start by parameterizing the code in MetaData.toXContent to, depending on the context (API vs GATEWAY), emit different XContent. We can then step-by-step resolve the differences between the two. Also we need more detailed examples to see how the output between the two differs. A good first test would be to take a complex cluster state, use the old serialization method, and check that the new parameterized one in MetaData.toXContent provides equivalent output.

@@ -82,8 +82,10 @@ public void testToXContent() throws IOException {
" \"attributes\" : { }\n" +
" }\n" +
" },\n" +
" \"metadata\" : {\n" +
" \"meta-data\" : {\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone was relying on the stability of the cluster state APIs, this is definitely the biggest thing that would break them. I wonder if we should change MetaData.toXContent to have metadata there, and be lenient in the MetaData.fromXContent method, allowing both. This means that the internal storage format can handle the change, and the user-facing cluster state will be kept the same as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to use meta-data / metadata in MetaData.toXContent depending on the context (API vs GATEWAY).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this makes sense. Will try to parametrize the code.

" \"aliases\" : [ ],\n" +
" \"primary_terms\" : {\n" +
" \"0\" : 0\n" +
" \"index.max_script_fields\" : \"10\",\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetaData.toXContent could have flat_settings defaulting to false if the context is API. That means that the representation would stay the same here.

" \"index.number_of_replicas\" : \"0\",\n" +
" \"index.number_of_shards\" : \"1\",\n" +
" \"index.shard.check_on_startup\" : \"true\",\n" +
" \"index.version.created\" : \"8000099\"\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version is hardcoded here, which means that this will break as soon as we do a version bump. Please use a variable as it used to be before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

coordinationMetaData().toXContent(builder, params);
builder.endObject();

builder.startObject("templates");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have an example that shows how / whether templates are rendered differently now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have it here as an intermediate commit bfc1103 . Please let me know if this look good.

builder.endObject();

builder.startObject("indices");
for (IndexMetaData indexMetaData : metaData()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we are now also exposing index metadata customs. I'm not sure if these might contain sensitive information. Need to double-check. We could perhaps omit these for API context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing some knowledge about the potential sensitive information here in metadata custom. Let me see if I can find some documentation about it.

@@ -1205,6 +1211,23 @@ public static void toXContent(IndexMetaData indexMetaData, XContentBuilder build
}
builder.endArray();

/*
==== for the above, ClusterState.toXContent has the following ====
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but how is this rendered differently? Can we see an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's available here as an intermediate commit 4850acc . Please let me know if this looks good.

@zacharymorn
Copy link
Contributor Author

@gwbrown No problem and I had some delay on my side as well. The caution taken definitely makes sense as I too also see changes in serialization that could potentially break clients. Please let me know when you guys have detailed plans for #1 and #3.

@zacharymorn
Copy link
Contributor Author

@ywelsch Yes I agree with the approach of seeing the changes in serialization as well. I did start my changes by generating an existing serialization of the cluster state (df24188), and each time I updated a portion of code I committed both the changes to code and to the serialization. In my last commit though, I removed the checked for serialization content and replaced it with a verification of delegating call (to member classes).

I think the problem here is, as the existing serialization of a complex cluster state are new changes in my existing PR, it might be hard to use git to clearly shows the changes of serialization introduced by the PR globally, as they will always be viewed as new changes. Will it help if I create a separate PR to just check in the existing serialization, and track the updates to the serialization in this PR?

@ywelsch
Copy link
Contributor

ywelsch commented Jan 27, 2020

yes, this PR is trying to do too many things at once. Let's start by having a PR first that parameterizes the code in MetaData.toXContent to, depending on the context (API vs GATEWAY), emit different XContent, providing further tests that the serialization between new and old method is identical. In follow-ups, we can then start removing the old logic, and make adjustments to bring API and GATEWAY serialization closer.

@zacharymorn
Copy link
Contributor Author

@ywelsch As this current PR has too many commits to cleanly show the changes, I’ve made a new branch and PR #52743 to better track the changes in the new direction. I'll close this PR for now.

As MetaData.toXContent is already parameterized as it checks for API context for certain serializations, I’ve organized the 4 commits in that PR as follows:

  1. Add API and Gateway serialization tests for complex ClusterState and MetaData objects without modifying production code
  2. Copy over ClusterState.toXContent serialization test metadata section to MetaData.toXContent API serialization tests to highlight the difference and setup baseline
  3. Parameterize MetaData.toXContent further and have its serialization under API context to be as close as possible to ClusterState.
    1. Note that as the updates to existing serialization test cases show, there are inconsistencies of serializations under API context.
    2. We can pick and choose which serializaton scheme we actually want, and which one should no longer be supported under API context.
  4. Replace ClusterState.toXContent metadata section with parameterized MetaData.toXContent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants