Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terms stats facet not accurate when dealing with high cardinality fields #921

Closed
lokivog opened this issue Feb 7, 2014 · 5 comments
Closed

Comments

@lokivog
Copy link

lokivog commented Feb 7, 2014

I've experience a scenario where the terms stats facet was not calculating totals accurately. After investigation, I found that this facets has an additional property named shard_size which will determine how many term entries will be requested from each shard. The terms stats facet documentation states:
"When dealing with field with high cardinality (at least higher than the requested size) The greater shard_size is - the more accurate the result will be (and the more expensive the overall facet computation will be). shard_size is there to enable you to increase accuracy yet still avoid returning too many terms_stats entries back to the client."

I would like to add the shard_size property as an option to the terms facet panel to account for data that requires accurate calculations.

@lokivog
Copy link
Author

lokivog commented Feb 7, 2014

Here is a trivial example that demonstrates the issue. When given the following data-set

{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Ten", "value": 1}
{ "name": "Nine", "value": 1}
{ "name": "Nine", "value": 1}
{ "name": "Nine", "value": 1}
{ "name": "Nine", "value": 2}
{ "name": "Nine", "value": 2}
{ "name": "Nine", "value": 2}
{ "name": "Eight", "value": 1}
{ "name": "Eight", "value": 1}
{ "name": "Eight", "value": 1}
{ "name": "Eight", "value": 1}
{ "name": "Eight", "value": 2}
{ "name": "Eight", "value": 2}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Seven", "value": 1}
{ "name": "Six", "value": 6}
{ "name": "Five", "value": 3}
{ "name": "Five", "value": 2}
{ "name": "Four", "value": 1}
{ "name": "Four", "value": 1}
{ "name": "Four", "value": 1}
{ "name": "Four", "value": 1}
{ "name": "Three", "value": 3}
{ "name": "Two", "value": 1}
{ "name": "Two", "value": 1}
{ "name": "One", "value": 1}

I've made the changes in a fork and here is an screen-shot depicting the data-set with the following:
size = 10, shard_size not set
size = 3, shard_size not set
size = 3, shard_size = 10

terms_stats_shard_size_graphs

@lokivog
Copy link
Author

lokivog commented Feb 7, 2014

Here is the edit panel. I've added tool tips to the size and accuracy settings to help explain the meanings of them to the user.

terms_stats_shard_size_edit_mode
terms_stats_shard_size_edit_mode2

@lokivog
Copy link
Author

lokivog commented Feb 17, 2014

Please let me know if you need anymore information regarding this issue. If not, is it possible to have the pull request merged?

@rashidkpc
Copy link
Contributor

This is a factor of the way elasticsearch does the distributed count. Unfortunately there is no way for Kibana to fix this. Elasticsearch issue here: elastic/elasticsearch#1305

@EmilyZhangHui
Copy link

When updating the length to 10, the terms pannel can't show the correct mean value.

w33ble pushed a commit to w33ble/kibana that referenced this issue Sep 13, 2018
w33ble added a commit to w33ble/kibana that referenced this issue Sep 13, 2018
to match react version, which was updated in elastic#921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@rashidkpc @lokivog @EmilyZhangHui and others