Include unindexed field in FieldStats response #21821

jimczi · 2016-11-28T10:32:22Z

This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.

Closes #21952

nik9000 · 2016-11-28T14:33:37Z

core/src/main/java/org/elasticsearch/action/fieldstats/FieldStats.java

    protected T minValue;
    protected T maxValue;

+    FieldStats(byte type, long maxDoc, long docCount, long sumDocFreq, long sumTotalTermFreq,


Add Javadoc for these two ctors now?

nik9000 · 2016-11-28T14:38:54Z

core/src/main/java/org/elasticsearch/action/fieldstats/FieldStats.java

@@ -550,27 +625,50 @@ public static FieldStats readFrom(StreamInput in) throws IOException {
        long sumTotalTermFreq = in.readLong();
        boolean isSearchable = in.readBoolean();
        boolean isAggregatable = in.readBoolean();
-
+        boolean hasMinMax = true;


It is weird to see this pre-NamedWriteable stuff still around. It might make sense to convert these to NamedWriteable in master.

I am not sure of this, NamedWriteable is ok when you want to serialize objects that have nothing in common but that's not the case here. We just want to serialize the min/max value differently for each specialization of the class. Rewriting this as a NamedWriteable would be completely different. We would need to create a class to encapsulate the min/max info and then use NamedWriteable to serialize these classes. I can do that but I prefer to do it in another PR since it's not related to the change that we're trying to make here ?

Yeah, another PR is totally fine. No need to hold this up. And I'm not 100% sure it makes sense to convert them, just that it makes sense to think about it.

We have lots of NamedWriteables that write things in common - queries and aggregations do it all the time. They defer to superclasses to load the common stuff.

nik9000 · 2016-11-28T14:41:18Z

core/src/main/java/org/elasticsearch/action/fieldstats/FieldStats.java

@@ -398,6 +456,12 @@ public String getMaxValueAsString() {
        private FormatDateTimeFormatter formatter;

        public Date(long maxDoc, long docCount, long sumDocFreq, long sumTotalTermFreq,
+                    boolean isSearchable, boolean isAggregatable) {
+            super((byte) 2, maxDoc, docCount, sumDocFreq, sumTotalTermFreq, isSearchable, isAggregatable);
+            this.formatter = null;


writeMinMax isn't going to work for this one, I think.

Why ? writeMinMax is only called when min/max infos are present. For this case there is no min/max so we don't call writeMinMax at all. Does it make sense ?

Yeah, that makes sense. That is fine then.

nik9000 · 2016-11-28T14:46:14Z

core/src/test/java/org/elasticsearch/fieldstats/FieldStatsTests.java

+            .setFields("field_index", "field_dv", "field_stored", "field_source").get();
+        assertThat(result.getAllFieldStats().size(), equalTo(3));
+        for (String field : new String[] {"field_index", "field_dv", "field_stored"}) {
+            assertThat(result.getAllFieldStats().get(field).getMaxDoc(), equalTo(11L));


I think this'll be more readable if you assign result.getAllFieldStats().get(field) to a variable.

nik9000 · 2016-11-28T14:47:54Z

core/src/test/java/org/elasticsearch/fieldstats/FieldStatsTests.java

+            assertThat(result.getAllFieldStats().get(field).getDisplayType(),
+                equalTo("string"));
+            if ("field_index".equals(field)) {
+                assertThat(result.getAllFieldStats().get(field).getMinValue(),


I'm not really a fan of assertThat(..., equalTo(...)). I feel like it is usually more clear to do assertEquals(...). Like, if you have lots of assertions and you think they should line up then fine, but in this case I think they are all equalTo so maybe switch? Not a big deal either way.

nik9000 · 2016-11-28T14:49:47Z

core/src/test/java/org/elasticsearch/fieldstats/FieldStatsTests.java

@@ -519,49 +600,84 @@ public void testMetaFieldsNotIndexed() {
    }

    public void testSerialization() throws IOException {
+        Version version = randomBoolean() ? Version.CURRENT : Version.V_5_1_0_UNRELEASED;


I'd test both in the same test run because it is fast.

nik9000 · 2016-11-28T14:52:47Z

core/src/test/java/org/elasticsearch/fieldstats/FieldStatsTests.java

        }
    }

    /**
     * creates a random field stats which does not guarantee that {@link FieldStats#maxValue} is greater than {@link FieldStats#minValue}
     **/
-    private FieldStats randomFieldStats() throws UnknownHostException {
+    private FieldStats randomFieldStats(boolean withNullMinMax) throws UnknownHostException {
        int type = randomInt(5);


I know it isn't a part of this change but I'm not a fan of the int type = randomInt(5); switch (type) {...} idiom because I think it is easy to make a mistake that excludes a branch. I'd prefer randomFrom(() -> {...}, () -> {...}).get() because it is more obvious that you haven't forgot a branch. Is that crazy?

I think it's fine to use the switch statement as long as the default throws an exception ?

It is fine but it makes me worried because I have to double check that randomInt is inclusive and the last element of the switch statement lines up.

jimczi · 2016-11-28T15:59:20Z

Thanks @nik9000
I pushed a commit to address your comments.
If it's ok for you I'd like to push this change (modulo the changes that you requested) and do the NamedWriteable stuff in another PR.

jpountz

I left a question.

jpountz · 2016-12-05T22:34:11Z

core/src/main/java/org/elasticsearch/action/fieldstats/FieldStats.java

+        if (hasMinMax == false) {
+            hasMinMax = true;
+            minValue = (T) other.minValue;
+            maxValue = (T) other.maxValue;


it feels wrong to say that we know the min/max unless the information is available for all indices?

I pushed be82024

Though I think we should distinguish null values because the shard is empty and null values because the field is not indexed. I could not do it in this commit because SeqNoFieldType#stats computes min/max values on a non-searchable field.
@bleskes @jasontedor what is the plan for this SeqNoFieldType#stats ? Are we going to index this field and remove the min/max retrieval ?

jpountz · 2016-12-05T22:39:25Z

core/src/main/java/org/elasticsearch/action/fieldstats/FieldStats.java

@@ -272,6 +335,9 @@ public final void writeTo(StreamOutput out) throws IOException {
     * otherwise <code>false</code> is returned
     */
    public boolean match(IndexConstraint constraint) {
+        if (hasMinMax == false) {
+            return false;
+        }


should we throw an exception if min/max info is not available?

This is the current behavior for fields that are not indexed so maybe better to just ignore these fields for now ? The index constraint works on indexed fields and always returns false on non-indexed fields, I can add a sentence like this in the docs if needed ?

That works for me. Thanks for the explanation.

This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.

…e this information for the field.

jpountz

LGTM

…ntation.

jimczi · 2016-12-06T12:33:17Z

Thanks @jpountz and @nik9000 !
I'll merge this to 5.2.0 now

* Include unindexed field in FieldStats response This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.

jimczi added :Data Management/Stats Statistics tracking and retrieval APIs >enhancement review v5.2.0 labels Nov 28, 2016

nik9000 reviewed Nov 28, 2016

View reviewed changes

nik9000 approved these changes Nov 28, 2016

View reviewed changes

jimczi added the v6.0.0-alpha1 label Nov 28, 2016

jimczi mentioned this pull request Dec 5, 2016

Unable to get field stats for unindexed fields #21952

Closed

jpountz reviewed Dec 5, 2016

View reviewed changes

jimczi added 4 commits December 6, 2016 10:02

After review: add javadocs and rewrite tests with cleaner asserts

ac546c9

fix compilation

727add1

Set min/max values to null if any of the requested shard does not hav…

be82024

…e this information for the field.

jpountz approved these changes Dec 6, 2016

View reviewed changes

Clarify index constraint behavior on non-indexed fields in the docume…

d401c16

…ntation.

jimczi merged commit b42ca6b into elastic:master Dec 6, 2016

jimczi deleted the field_stats_field_infos branch December 6, 2016 12:32

clintongormley mentioned this pull request Jan 10, 2017

field stats api should return results for fields in the mapping, but without indexed documents #22438

Closed

Bargs mentioned this pull request Mar 20, 2017

Murmur3 hash fields missing from Kibana Visualization elastic/kibana#10782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include unindexed field in FieldStats response #21821

Include unindexed field in FieldStats response #21821

jimczi commented Nov 28, 2016 •

edited by clintongormley

Loading

nik9000 Nov 28, 2016

nik9000 Nov 28, 2016

jimczi Nov 28, 2016

nik9000 Dec 5, 2016

nik9000 Nov 28, 2016

jimczi Nov 28, 2016

nik9000 Nov 28, 2016

nik9000 Nov 28, 2016

nik9000 Nov 28, 2016

nik9000 Nov 28, 2016

nik9000 Nov 28, 2016

jimczi Nov 28, 2016

nik9000 Nov 28, 2016

jimczi commented Nov 28, 2016

jpountz left a comment

jpountz Dec 5, 2016

jimczi Dec 6, 2016

jpountz Dec 5, 2016

jimczi Dec 6, 2016

jpountz Dec 6, 2016

jpountz left a comment

jimczi commented Dec 6, 2016

Include unindexed field in FieldStats response #21821

Include unindexed field in FieldStats response #21821

Conversation

jimczi commented Nov 28, 2016 • edited by clintongormley Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimczi commented Nov 28, 2016

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

jimczi commented Dec 6, 2016

jimczi commented Nov 28, 2016 •

edited by clintongormley

Loading