Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no buckets returned by terms aggregation on raw field #14161

Closed
cwithers opened this issue Oct 16, 2015 · 12 comments
Closed

no buckets returned by terms aggregation on raw field #14161

cwithers opened this issue Oct 16, 2015 · 12 comments

Comments

@cwithers
Copy link

Hi,

I'm getting strange behaviour with a terms aggregation. In the example below, the terms aggregation based on the "fruit" field returns 3 buckets but the aggregation on the "fruit.raw" field returns 0 buckets. I've also tried "fruit.fruit.raw", which returns 3 buckets, as expected.

Am I doing something subtly wrong?

Many Thanks,
Chris

Here is my mapping:

{
    "my__fruit__1445001899868" : {
        "mappings" : {
            "fruit" : {
                "dynamic" : "strict",
                "properties" : {
                    "color" : {
                        "type" : "string",
                        "fields" : {
                            "raw" : {
                                "type" : "string",
                                "index" : "not_analyzed"
                            }
                        }
                    },
                    "fruit" : {
                        "type" : "string",
                        "fields" : {
                            "raw" : {
                                "type" : "string",
                                "index" : "not_analyzed"
                            }
                        }
                    }
                }
            }
        }
    }
}

Here is my data:

{"index": {"_index": "my__fruit__1445001899868", "_type" : "fruit", "_id" : "2GjU4Yf3R323B9enVSbFuQ"}}
{"fruit":"banana", "color":"yellow"}
{"index": {"_index" : "my__fruit__1445001899868", "_type" : "fruit", "_id" : "mnCDZBNbQ2GgX0JnTIaMxg"}}
{"fruit":"apple", "color":"green"}
{"index": {"_index" : "sand__fruit__1445001899868", "_type" : "fruit", "_id" : "kXlD__CvTk21cicCTncLCQ"}}
{"fruit":"mango", "color":"green"}

Here is the working request and response:

{
    "query" : {
        "match_all" : {}
    },
    "from" : 0,
    "size" : 0,
    "aggs" : {
        "fruit" : {
            "terms" : {
                "field" : "fruit"
            }
        },
        "color" : {
            "terms" : {
                "field" : "color"
            }
        }
    }
}
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "green",
        "doc_count" : 2
      }, {
        "key" : "yellow",
        "doc_count" : 1
      } ]
    },
    "fruit" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "apple",
        "doc_count" : 1
      }, {
        "key" : "banana",
        "doc_count" : 1
      }, {
        "key" : "mango",
        "doc_count" : 1
      } ]
    }
  }
}

Here is the failing request and response:

{
    "query" : {
        "match_all" : {}
    },
    "from" : 0,
    "size" : 0,
    "aggs" : {
        "fruit" : {
            "terms" : {
                "field" : "fruit.raw"
            }
        },
        "color" : {
            "terms" : {
                "field" : "color.raw"
            }
        }
    }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "green",
        "doc_count" : 2
      }, {
        "key" : "yellow",
        "doc_count" : 1
      } ]
    },
    "fruit" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}
@dadoonet
Copy link
Member

Please ask user questions on discuss.elastic.co. We can better help you there.
May be you did not reindex your data after applying your mapping changes?

@clintongormley
Copy link

@cwithers The type name in your mappings is incident, the type name in your docs is fruit, so you don't have any .raw fields for your data

@cwithers
Copy link
Author

re: dadoonet, I definitely created the index before populating it so that's not it, unfortunately.

re: clintongormley, I've tried to change the example here to something non customer data specific and missed one of the uses of "incident".

@cwithers
Copy link
Author

Can this be re-opened? it is a real issue that I can reproduce on a clean 1.7.2 Elasticsearch install.

It looks like having a field name (fruit) that matches the name of the type (fruit) can confuse Elasticsearch in certain cases.

@dadoonet
Copy link
Member

Can you share your full script which helps to reproduce it?

@cwithers
Copy link
Author

I create the mapping using curl, then load the data using curl, then make the request using curl:

curl -XPOST http://elasticsearch:9200/fruit -d @C:\src\Curl\fruit-type.json
curl -XPOST http://elasticsearch:9200/fruit/_bulk --data-binary @C:\src\Curl\fruit-data.json
curl -XPOST http://elasticsearch:9200/fruit/_search?pretty -d @C:\src\Curl\ESsearchRequest.json > C:\src\Curl\ESsearchResponse.json

@dadoonet
Copy link
Member

As Clint said, your example is incorrect. Can you send a full correct script that you run on your end and which reproduces the issue?

@cwithers
Copy link
Author

The full script is embedded in a lot of Java code, which uses the Elasticsearch REST API to create templates, containg document type metadata and properties, and from the templates timestamped indices. The code also uses the REST API to bulk index the data and to perform searches with lots of aggregations.

I can successfully reproduce the strange behaviour with the above curl command to perform the search with only two aggregations directly against Elasticsearch.

The above mapping was exported using Elasticsearch's REST API, I only removed the metadata, additional field definitions and renamed "incident" with "fruit".

I originally found the issue with Elasticsearch 1.5.1 and to be sure it wasn't already fixed in 1.7.2, I installed 1.7.2 and pointed my Java code at it. Unfortunately, I got the same strange behaviour with one of the terms aggregations.

I didn't know about the discussion forum so thanks for pointing that out.

@clintongormley
Copy link

It looks like having a field name (fruit) that matches the name of the type (fruit) can confuse Elasticsearch in certain cases.

Yes it can. This is a known issue which is fixed in 2.0

@clintongormley
Copy link

See #8870

@cwithers
Copy link
Author

I was aware of that 2.0 change after it being mentioned at Elastic{on} but didn't realise the issue could manifest in this way when there is only one type in an index.

Thanks for all your help 😊

@clintongormley
Copy link

np :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants