-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include unindexed field in FieldStats response #21821
Conversation
protected T minValue; | ||
protected T maxValue; | ||
|
||
FieldStats(byte type, long maxDoc, long docCount, long sumDocFreq, long sumTotalTermFreq, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Javadoc for these two ctors now?
@@ -550,27 +625,50 @@ public static FieldStats readFrom(StreamInput in) throws IOException { | |||
long sumTotalTermFreq = in.readLong(); | |||
boolean isSearchable = in.readBoolean(); | |||
boolean isAggregatable = in.readBoolean(); | |||
|
|||
boolean hasMinMax = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to see this pre-NamedWriteable stuff still around. It might make sense to convert these to NamedWriteable in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure of this, NamedWriteable is ok when you want to serialize objects that have nothing in common but that's not the case here. We just want to serialize the min/max value differently for each specialization of the class. Rewriting this as a NamedWriteable would be completely different. We would need to create a class to encapsulate the min/max info and then use NamedWriteable to serialize these classes. I can do that but I prefer to do it in another PR since it's not related to the change that we're trying to make here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, another PR is totally fine. No need to hold this up. And I'm not 100% sure it makes sense to convert them, just that it makes sense to think about it.
We have lots of NamedWriteables that write things in common - queries and aggregations do it all the time. They defer to superclasses to load the common stuff.
@@ -398,6 +456,12 @@ public String getMaxValueAsString() { | |||
private FormatDateTimeFormatter formatter; | |||
|
|||
public Date(long maxDoc, long docCount, long sumDocFreq, long sumTotalTermFreq, | |||
boolean isSearchable, boolean isAggregatable) { | |||
super((byte) 2, maxDoc, docCount, sumDocFreq, sumTotalTermFreq, isSearchable, isAggregatable); | |||
this.formatter = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeMinMax
isn't going to work for this one, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ? writeMinMax
is only called when min/max infos are present. For this case there is no min/max so we don't call writeMinMax at all. Does it make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. That is fine then.
.setFields("field_index", "field_dv", "field_stored", "field_source").get(); | ||
assertThat(result.getAllFieldStats().size(), equalTo(3)); | ||
for (String field : new String[] {"field_index", "field_dv", "field_stored"}) { | ||
assertThat(result.getAllFieldStats().get(field).getMaxDoc(), equalTo(11L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this'll be more readable if you assign result.getAllFieldStats().get(field)
to a variable.
assertThat(result.getAllFieldStats().get(field).getDisplayType(), | ||
equalTo("string")); | ||
if ("field_index".equals(field)) { | ||
assertThat(result.getAllFieldStats().get(field).getMinValue(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a fan of assertThat(..., equalTo(...))
. I feel like it is usually more clear to do assertEquals(...)
. Like, if you have lots of assertions and you think they should line up then fine, but in this case I think they are all equalTo
so maybe switch? Not a big deal either way.
@@ -519,49 +600,84 @@ public void testMetaFieldsNotIndexed() { | |||
} | |||
|
|||
public void testSerialization() throws IOException { | |||
Version version = randomBoolean() ? Version.CURRENT : Version.V_5_1_0_UNRELEASED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd test both in the same test run because it is fast.
} | ||
} | ||
|
||
/** | ||
* creates a random field stats which does not guarantee that {@link FieldStats#maxValue} is greater than {@link FieldStats#minValue} | ||
**/ | ||
private FieldStats randomFieldStats() throws UnknownHostException { | ||
private FieldStats randomFieldStats(boolean withNullMinMax) throws UnknownHostException { | ||
int type = randomInt(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it isn't a part of this change but I'm not a fan of the int type = randomInt(5); switch (type) {...}
idiom because I think it is easy to make a mistake that excludes a branch. I'd prefer randomFrom(() -> {...}, () -> {...}).get()
because it is more obvious that you haven't forgot a branch. Is that crazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use the switch statement as long as the default
throws an exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine but it makes me worried because I have to double check that randomInt
is inclusive and the last element of the switch statement lines up.
Thanks @nik9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question.
if (hasMinMax == false) { | ||
hasMinMax = true; | ||
minValue = (T) other.minValue; | ||
maxValue = (T) other.maxValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels wrong to say that we know the min/max unless the information is available for all indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed be82024
Though I think we should distinguish null values because the shard is empty and null values because the field is not indexed. I could not do it in this commit because SeqNoFieldType#stats computes min/max values on a non-searchable field.
@bleskes @jasontedor what is the plan for this SeqNoFieldType#stats ? Are we going to index this field and remove the min/max retrieval ?
@@ -272,6 +335,9 @@ public final void writeTo(StreamOutput out) throws IOException { | |||
* otherwise <code>false</code> is returned | |||
*/ | |||
public boolean match(IndexConstraint constraint) { | |||
if (hasMinMax == false) { | |||
return false; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we throw an exception if min/max info is not available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the current behavior for fields that are not indexed so maybe better to just ignore these fields for now ? The index constraint works on indexed fields and always returns false on non-indexed fields, I can add a sentence like this in the docs if needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me. Thanks for the explanation.
This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.
…e this information for the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Include unindexed field in FieldStats response This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.
This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.
Closes #21952