-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent facet counts #1832
Comments
Hi, did you try to run your example on a single shard? Assuming your example run on a 3 shards which are there by default. May be you are hitting similar issue to this: #667 |
I've just run it against an index with 1 shard and the numbers are correct however is there a way to make this work with multiple shards? Is this the expected behaviour or is this still a bug? |
It is not a bug per se. It is a performance trade-off for the distributed calculation. You can try to increate the size to minimize the effect of it. AFAIK Shay have a plan to implement some improvements to allow more accurate results. |
I've just tried changing the size on the query to various numbers between 1 and the size of the dataset and it always returns 12, for an index with roughly 150 documents and 10 properties on each one is there a reason NOT to use just 1 shard? |
I was referring to size parameter of terms facet, not the size of the query. In fact what is happening is the following: when ES wants to calculate top term facets then it calculates top 'size' terms per shard [or may be (top 'size' x number_of_shards), do not remember from the top of my head, but it is not important much] and then all these 'top' sets are collected by a single node (the one that started the query) and are aggregated into final top 'size' result. This strategy does not always lead to correct global results, it also depends on the nature of your data and their distribution among shards. If you have only 150 documents then using one shard will help. Other option would be using terms facet 'size' = 150. |
Solr provides accuracy here with a 2nd call to shards to calculate counts found on share A but so far missing from B,C,...Z. And vis versa. |
looks like a duplicate of #1305 |
I've got an issue with facet counts that I've managed to simplify into a re-creatable example (attached).
When searching an index I get a facet count for 'fabric' as 12 however when I then filter on that attribute it increases to 13, not sure how this is possible as by adding a must query I can surely only decrease the facet counts.
Attached is the script which should recreate the issue in a test_bug index, it will insert 118 documents and run a query where the "fabric" facet comes out at 12.
Query looks like:
Facet comes out as:
When adding a filter on the fabric like this:
the fabric facet then increases to 13:
Hopefully I've explained everything and you can also recreate this. I'm using elasticsearch-0.19.1 on Mac OS X Lion.
Script for recreating data is at https://gist.github.com/2283964
The text was updated successfully, but these errors were encountered: