Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bolster field value retrieval in Top Hits metric agg #11141

Closed
Tracked by #179200
spalger opened this issue Apr 10, 2017 · 7 comments
Closed
Tracked by #179200

Bolster field value retrieval in Top Hits metric agg #11141

spalger opened this issue Apr 10, 2017 · 7 comments
Labels
blocked discuss Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@spalger
Copy link
Contributor

spalger commented Apr 10, 2017

The top_hits metric is currently checking field.doc_values to see if it needs to ask for the field's value via docvalue_fields. This strategy covers the majority of use cases, but with the move to the _field_caps API field.doc_values will no longer be available. Additionally, there are other field value storage schemes beyond _source and doc_values that this doesn't account for.

For now the plan is to change the condition to aggregatable && type !== "text" and store the value at field.readFromDocValues (or something similar). This condition isn't terrible, it also works in the majority of scenarios, but it holds a lot of assumptions about what aggregatable means. Should the definition of aggregatable shift, or the default behavior for certain types change, this condition could very well become a problem. We probably can't depend on it long term.

We should probably see how elastic/elasticsearch#24036 plays out before moving forward, but if that ticket doesn't lead anywhere we need to consider other options.

@spalger spalger added :Discovery Feature:Visualizations Generic visualization features (in case no more specific feature label is available) blocked discuss labels Apr 10, 2017
@jpountz
Copy link

jpountz commented Apr 11, 2017

I think it is safe to assume that aggregatable also means sortable and usable in docvalue_fields. If we were to give aggregatable a different meaning, that would not only be a big change for Kibana, but also for Elasticsearch.

@Bargs
Copy link
Contributor

Bargs commented Apr 11, 2017

We discussed this a lot during the original top hits PR, it's complicated...

Here's another suggestion: the _source and stored_fields params in the search request body will simply ignore fields that don't exist. docvalue_fields is inconsistent with those other options and will return an error if the field does not have doc_values enabled. If docvalue_fields simply ignored irrelevant fields we could request all three in top hits and use whichever comes back. The only other hitch to this is that docvalue_fields and stored_fields both return values under the fields key in the response, so there's no way for the client to differentiate between a multi-valued field and one where both doc_values and stored are enabled (which seems like an issue in its own right that I'd really like to have a solution to).

@Bargs
Copy link
Contributor

Bargs commented Apr 11, 2017

For now the plan is to change the condition to aggregatable && type === "text"

Why type === text? For text fields I think we want to use _source, the fielddata will be unintelligible to users.

@jpountz
Copy link

jpountz commented Apr 14, 2017

I'd like to understand more about the background of this issue. For instance why don't you just request the field via _source? Is it to support cases when _source is disabled?

Also I'm wondering what would be wrong with doing something like this:

if (field.aggregatable) {
  output.params.docvalue_fields = [ field.name ];
} else {
  output.params.stored_fields = [ field.name ];
}

@Bargs
Copy link
Contributor

Bargs commented Apr 17, 2017

For instance why don't you just request the field via _source? Is it to support cases when _source is disabled?

Yeah, _source might be disabled. Or we might prefer the normalized version of the field value rather than the original _source value, like with dates for instance. I also assume it is faster. It might be negligible in most cases since we're only pulling back one or a few documents, but some users have pretty massive docs where it might make a difference.

Also I'm wondering what would be wrong with doing something like this

I think that would work fine at the moment, but I do agree with @spalger that field access is really complicated right now. For one, things keep changing. stored_fields used to be fields. docvalue_fields used to be fielddata_fields. docvalue_fields now retrieves both docvalues and fielddata fields. The ES API refers to the storage mechanism being used, so every time something changes with the storage mechanism, Kibana has to change too. Generally we don't care how the field is retrieved though. We just want to get the field the user requested in the most efficient manner, regardless of how it's stored, which I think is partly what @spalger was getting at in elastic/elasticsearch#24036.

Extracting the field value from the response is also complicated by the number of options available. docvalue_fields, stored_fields, and script_fields are all returned under the fields key which can lead to duplicate values if a field is stored and has docvalues enabled. docvalue_fields and stored_fields might have different representations of the same value. They handle missing fields differently. _source is a completely different beast. Inner objects are returned in a totally different format. We maintain complicated flattening logic to get values out of _source in a way that's sort of similar to what ES does internally. Values represent the original document, not what ES sees.

So there are a lot of variables that hinge on the choice between docvalue_fields, stored_fields, _source, and even script_fields. Generally I think Kibana only cares about two things when accessing field values: how is the value formatted and are we accessing it efficiently?

@spalger
Copy link
Contributor Author

spalger commented May 23, 2017

@Bargs sorry, missed these responses, re: #11141 (comment)

It was supposed to be aggregatable && type !== "text", fixed in the description

@timductive
Copy link
Member

Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner.

@timductive timductive closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked discuss Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

5 participants