Ensure conflicted fields can be searchable and/or aggregatable #13070

chrisronline · 2017-07-24T18:48:21Z

Resolves #12728

Summary

This PR ensures that fields that have type conflicts will no longer be assumed as non aggregatable or non searchable within Kibana.

With this change, there is a possibility of shard failures in ES. Due to a recent change in ES, users with more than 128 shards will not see a shard failure error because of the type conflict. However, users with less than 128 shards will need to update their mappings and reindex their data to prevent related shard failures.

Testing

Create two related indices with the same field name but use different related types (such as text and keyword or long and integer)
Add data to each index.
Create a kibana index pattern that includes both indices.
Notice that the mappings do not report a conflict and the fields are searchable and aggregetable
Navigate to Discover and verify you can filter off the fields
Navigate to Visualize and build a visualization using the fields

Note: If you already have an environment setup with conflicted fields, simply refresh the field list for the kibana index pattern instead.

Testing Screenshots

My environment is less than 128 shards so I will see a shard failure:

Sample Data

PUT foo
{
  "mappings": {
    "myObj": {
      "properties": {
        "name": {
          "type": "text"
        },
        "time": {
          "type": "date"
        },
        "count": {
          "type": "integer"
        },
        "type_conflict": {
          "type": "integer"
        }
      }
    }
  }
}

PUT foo/myObj/1
{
  "id": 1,
  "name": "Jim",
  "time": "2016-07-18T14:00:48.365Z",
  "count": 1,
  "type_conflict": 1
}

PUT foo/myObj/2
{
  "id": 2,
  "name": "Halpert",
  "time": "2016-07-18T14:00:48.365Z",
  "count": 3,
  "type_conflict": 2
}

PUT foo2
{
  "mappings": {
    "myObj": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "time": {
          "type": "date"
        },
        "count": {
          "type": "integer",
          "doc_values": false
        },
        "type_conflict": {
          "type": "text"
        }
      }
    }
  }
}

PUT foo2/myObj/1
{
  "id": 1,
  "name": "Michael",
  "time": "2017-07-18T14:00:48.365Z",
  "count": 1,
  "type_conflict": "1"
}

PUT foo2/myObj/2
{
  "id": 2,
  "name": "Scott",
  "time": "2017-07-18T14:00:48.365Z",
  "count": 3,
  "type_conflict": "2"
}

alexfrancoeur · 2017-07-24T18:53:04Z

Nice @chrisronline! Is this something we can backport to 5.5 as well? I'm worried that we'll see more issues like this as 5.5 gains greater adoption

chrisronline · 2017-07-25T13:41:10Z

@alexfrancoeur I don't see a reason why not, but I'd defer to @epixa's judgement

epixa · 2017-07-25T14:27:57Z

Yeah, we'll want to fix this in 5.5 as well

spalger · 2017-07-25T16:21:05Z

src/server/index_patterns/service/lib/field_capabilities/field_caps_response.js

@@ -67,8 +67,8 @@ export function readFieldCapsResponse(fieldCapsResponse) {
      return {
        name: fieldName,
        type: 'conflict',
-        searchable: false,
-        aggregatable: false,
+        searchable: types.reduce((acc, esType) => (acc || capsByType[esType].searchable), false),


I think it'd be cleaner if this used Array#some() rather than reduce.

types.some(type => !!capsByType[type].searchable)

chrisronline · 2017-07-25T16:33:47Z

@spalger Great suggestion. Updated

spalger · 2017-07-25T16:51:52Z

Do we know which aggregations supported conflict fields in 5.x? I'm trying to do an aggregations on a field that is long in one index and integer in another, since the field is conflict types I'm not able to select it in the histogram field chooser.

chrisronline · 2017-07-25T18:36:12Z

Per a discussion with @spalger, we've decided to make some changes here. Instead of trying multiple types as a conflict in the field caps response, we are now resolving all multiple types into kibana types and if they all resolve into the same kibana type, we're no longer considering that a conflict.

For example, text and keyword both resolve to string within Kibana so there is no longer a conflict. integer and long both resolve to number within Kibana so there is no longer a conflict.

Because this will solve the immediate use case, we can revert all the UI changes since those fields will not be reported as conflicted. In the case where there are still conflicted fields (where the types simply are different), the behavior will still exist where users are unable to select that field for filtering or aggregating.

++ to @spalger for the suggestion!

epixa · 2017-07-25T18:39:28Z

@chrisronline Can you elaborate on why this is the better approach to simply leaning on field_caps for these values?

spalger · 2017-07-25T18:46:23Z

src/server/index_patterns/service/lib/field_capabilities/field_caps_response.js

@@ -63,27 +64,45 @@ export function readFieldCapsResponse(fieldCapsResponse) {
    const capsByType = capsByNameThenType[fieldName];
    const types = Object.keys(capsByType);

+    // If there are multiple types but they all resolve to the same kibana type
+    // ignore the conflict and carry on (my wayward son)
    if (types.length > 1) {


No reason to check the length of types here, just get the uniqueKibanaTypes and check that.

spalger · 2017-07-25T18:47:10Z

@epixa I wouldn't say this is better, but it is better replicating the behavior of 5.x

spalger · 2017-07-25T18:48:51Z

src/server/index_patterns/service/lib/field_capabilities/field_caps_response.js

+      if (uniqueKibanaTypes.length > 1) {
+        // If a single type is marked as searchable or aggregatable, all the types are searcuable or aggregatable
+        const conflictIsSearchable = types.some(type => {
+          return !!capsByType[type].searchable ||


This logic would probably make a nice little helper function that could be reused below too. Maybe something like hasCapability(capsByType[type], 'searchable')?

chrisronline · 2017-07-25T18:52:48Z

@epixa

Based on the test data listed in the description, consider the responses from field_stats and field_caps:

GET foo*/_field_stats/?fields=name'
...
"fields": {
  "name": {
    "type": "string",
    "max_doc": 4,
    "doc_count": 4,
    "density": 100,
    "sum_doc_freq": 4,
    "sum_total_term_freq": -1,
    "searchable": true,
    "aggregatable": true,
    "min_value": "Michael",
    "max_value": "jim"
  }
}
...

The name field has two different types in ES: text and keyword. But the response doesn't reflect that. It resolves them both to type string.

GET foo*/_field_caps/?fields=name

{
  "fields": {
    "name": {
      "text": {
        "type": "text",
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "foo"
        ]
      },
      "keyword": {
        "type": "keyword",
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "foo2"
        ]
      }
    }
  }
}

With field_caps, it doesn't resolve them to a single type but returns both types. However, if we converted both text and keyword to a kibana type, we'd get string.

Per @spalger, we want to achieve as much of 5.x behavior as possible and this smaller code change will still get us there.

epixa · 2017-07-25T18:59:37Z

Thanks for the summary

spalger

LGTM

epixa · 2017-07-26T18:59:56Z

Can you merge in the latest master to get CI running on a working commit?

chrisronline · 2017-07-26T19:01:50Z

@epixa Sure! Just rebased and CI is building now

epixa · 2017-07-26T19:05:20Z

Can we get a test for these changes? This seems like a pretty critical behavior for how we handle fields in Kibana, so a regression here will break a lot of stuff (as evidence by our regression here breaking a lot of stuff).

chrisronline · 2017-07-26T19:49:27Z

@epixa Definitely. I somehow missed the existing test file when looking the first time, but I updated it to include tests for the newly added logic.

chrisronline · 2017-07-27T19:36:18Z

@spalger Added the functional test. Lemme know if that looks right to you.

spalger

Couple minor things, once they're fixed I say merge this sucker!

spalger · 2017-07-28T00:29:10Z

test/api_integration/apis/index_patterns/fields_for_wildcard_route/conflicts.js

-            ]
+          const dateField = resp.body.fields.find(f => f.type === 'date');
+          const successField = resp.body.fields.find(f => f.type === 'conflict');
+          expect(dateField).to.eql({


These tests intentionally test the entire response body. If it changes at all these tests should fail. Mind not filtering but asserting that the entire body matches?

Sure. Updated

spalger · 2017-07-28T00:29:21Z

test/api_integration/fixtures/es_archiver/index_patterns/conflicts/mappings.json

    "settings": {
      "index": {
-        "number_of_shards": "1",
+        "number_of_shards": "5",


No reason for 5 shards, right?

No reason, no. I didn't manually change that so I'm not sure what happened, but reverted back to 1

chrisronline · 2017-07-28T13:48:45Z

@spalger Updated. Ready for another pass!

spalger · 2017-07-28T14:39:57Z

LGTM

* Ensure that conflict fields can be searchable and/or aggregatable in the UI * Use `some` instead of `reduce` * Revert UI changes * Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict * Add comma back * Cleaner code * Add tests * Update failing test to handle searchable and aggregatable properly * Add functional test to ensure similar ES types are properly merged * Update tests * Revert shard size

chrisronline · 2017-07-28T14:49:36Z

Backport:

6.x: b8edbc2
6.0: d96927e
5.5: b0f1383
5.6: fd560bc

* Ensure that conflict fields can be searchable and/or aggregatable in the UI * Use `some` instead of `reduce` * Revert UI changes * Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict * Add comma back * Cleaner code * Add tests * Update failing test to handle searchable and aggregatable properly * Add functional test to ensure similar ES types are properly merged * Update tests * Revert shard size

excalq · 2017-08-08T06:05:40Z

This regression breaks a lot of our visualizations as well. Spending [more] hours to reindex is looking to be a lost cause, as it's a very slow process. Very happy to hear that the string->text/keyword problem is worked in newer versions of Kibana. Any indication when 5.5.2 might be released as a package? The snapshot release link is 404, sadly.

epixa · 2017-08-09T14:35:52Z

@excalq We're working on shoring up/QAing the 5.5.2 release now, so hopefully it'll go out in the next week or two.

KeithTt · 2018-05-23T06:15:16Z

I have met this issue, my elastic version is 5.5.1.

What version fixed this issue?

epixa · 2018-05-24T18:23:44Z

@KeithTt 5.5.2

stacey-gammon · 2018-12-03T20:56:28Z

@epixa, @spalger, @chrisronline - @AlonaNadler and I are chatting about the impending ECS changes and there is concern that this will cause a lot more of these index mapping conflicts. It appears that this PR didn't actually fix the problem, since we have the open issues referenced above. Is this correct? The issue still exists? Disclaimer - I did not read through all the comments, but judging by the last two, seemed like it should be fixed. Thanks for any extra insight!

ppf2 · 2018-12-03T21:05:10Z

Looking at the testing description, it looks like it is focused on concrete data type differences (i.e., text vs. keyword, keyword vs. integer, integer vs. long, etc..)? What about the scenario if one index has "host" as a keyword field vs. another index where "host" represents as an object (i.e. host.X, etc..)?

epixa · 2018-12-03T21:28:00Z

@stacey-gammon The underlying issue here is that Elasticsearch (by design) does not support searching across types without causing shard errors. There might be exceptions to that that I'm not familiar with, but on the whole that's how it is. This particular PR was removing an artificial limitation in Kibana that disallowed even attempting to search/aggregate on a field that we identified as having conflicts, so the end result would be that you can attempt to aggregate on the field anyway and if ES returns shard failures, we'll show them as errors.

For large data sets, the shard filter phase in Elasticsearch will kick in and should eliminate the shard errors in the most common cases because it will limit the amount of shards that actually get hit to serve the request, so it probably won't touch older indices that have different mappings. The downside to this approach is that when you do search across mapping boundaries, you get shard errors.

Small data sets might be under the 128 shard default threshold for the shard filter phase, they might need to reindex. At this scale, reindexing is realistic, so I'm personally pretty comfortable with the ES team's recommendation here.

So the big issue is on larger data sets when searching across mapping boundaries, and I'm not sure how we could fix that reliably. So far every time we've attempted to add conveniences in Kibana to diverge from Elasticsearch's behaviors in order to improve the search experience, we've shot ourselves in the foot and made things worse. The behavior this PR undoes is such an example where we tried to make the UX nicer for folks and ended up hurting people.

ppf2 · 2018-12-03T22:02:04Z

This particular PR was removing an artificial limitation in Kibana that disallowed even attempting to search/aggregate on a field that we identified as having conflicts, so the end result would be that you can attempt to aggregate on the field anyway and if ES returns shard failures, we'll show them as errors.

@epixa

It seems like the PR is not complete; specifically, it doesn't appear to handle the concrete vs. object type scenario. Take this Elasticsearch example:

Say there are 2 instances:
In test-2018.12.02, "host" is defined as a keyword field.
In test-2018.12.03, "host" is an object field with name as a string field as a subfield.

By the way, the above example is not arbitrary, this is one possibility of Beat's breaking schema change in 6.3+:

PUT test-2018.12.02
{
  "mappings": {
    "type": {
      "properties": {
        "host": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT test-2018.12.03
{
  "mappings": {
    "type": {
      "properties": {
        "host": {
          "properties":{
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}

If an aggregation is requested against test-* and "host" field, ES will still return aggregation results without errors:

GET test-*/_search
{
  "aggs": {
    "NAME": {
      "terms": {
        "field": "host",
        "size": 10
      }
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-2018.12.02",
        "_type" : "type",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-12-02T01:00:00Z",
          "host" : "firestorm"
        }
      },
      {
        "_index" : "test-2018.12.03",
        "_type" : "type",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-12-03T01:00:00Z",
          "host" : {
            "name" : "firestorm2"
          }
        }
      }
    ]
  },
  "aggregations" : {
    "NAME" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "firestorm",
          "doc_count" : 1
        }
      ]
    }
  }
}

Except that it will only include the aggregation results obtained from test-2018.12.02 because it is the one that has the concrete "host" field.

In Kibana, when it detects a difference in the field type across the 2 indices, it will simply block "host" from being used in any aggregations.

Even though field caps indicate that the "host" field for test-2018.12.02 is actually aggregatable (as well as host.name for test-2018.12.03):

{
  "fields" : {
    "host" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true,
        "indices" : [
          "test-2018.12.02"
        ]
      },
      "object" : {
        "type" : "object",
        "searchable" : false,
        "aggregatable" : false,
        "indices" : [
          "test-2018.12.03"
        ]
      }
    },
    "host.name" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    }
  }
}

It seems like for consistency purposes, Kibana can relax the conflict checking here and also allow "host" field to be aggregatable. This way, Kibana can just let ES decide what aggregation results will actually be returned if the users' date range cover both forms of the indices v.s., today where it completely prevents users from being able to use the host field that is present in older indices.

epixa · 2018-12-03T22:30:14Z

@ppf2 Can you open a new issue so the apps team can review? This issue is ancient history.

ppf2 · 2018-12-03T23:23:32Z

Thanks @epixa . Filed: #26583

chrisronline added :Management blocker review v5.6.0 v6.0.0 labels Jul 24, 2017

chrisronline self-assigned this Jul 24, 2017

chrisronline requested review from epixa and spalger July 24, 2017 18:48

chrisronline added v5.5.1 v5.5.2 labels Jul 25, 2017

epixa removed the v5.5.1 label Jul 25, 2017

spalger reviewed Jul 25, 2017

View reviewed changes

spalger approved these changes Jul 25, 2017

View reviewed changes

chrisronline force-pushed the fix/searchable_aggregatable_conflicts branch from a4eb414 to c11457e Compare July 26, 2017 19:01

Add functional test to ensure similar ES types are properly merged

7048fce

chrisronline force-pushed the fix/searchable_aggregatable_conflicts branch from c0063f0 to 7048fce Compare July 27, 2017 19:36

spalger approved these changes Jul 28, 2017

View reviewed changes

chrisronline added 2 commits July 28, 2017 09:46

Update tests

c94c60d

Revert shard size

9288e50

chrisronline merged commit 389115c into elastic:master Jul 28, 2017

chrisronline deleted the fix/searchable_aggregatable_conflicts branch August 10, 2017 17:24

chrisronline mentioned this pull request Nov 8, 2017

Fail to create aggregate on a field with a mixed numeric data type #12285

Closed

alexfrancoeur mentioned this pull request Nov 14, 2017

Advanced setting to show fields regardless of conflict type #14937

Closed

Bargs mentioned this pull request Nov 27, 2018

No warning when index-pattern field contains multiple types. #26296

Closed

ppf2 mentioned this pull request Dec 3, 2018

Ensure conflicted fields (concrete vs. object fields) can be searchable and/or aggregatable #26583

Closed

Ensure conflicted fields can be searchable and/or aggregatable #13070

Ensure conflicted fields can be searchable and/or aggregatable #13070

Conversation

chrisronline commented Jul 24, 2017 • edited Loading

Summary

Testing

Testing Screenshots

Sample Data

alexfrancoeur commented Jul 24, 2017

chrisronline commented Jul 25, 2017

epixa commented Jul 25, 2017

spalger Jul 25, 2017

Choose a reason for hiding this comment

chrisronline Jul 25, 2017

Choose a reason for hiding this comment

chrisronline commented Jul 25, 2017

spalger commented Jul 25, 2017

chrisronline commented Jul 25, 2017

epixa commented Jul 25, 2017

spalger Jul 25, 2017

Choose a reason for hiding this comment

spalger commented Jul 25, 2017 • edited Loading

spalger Jul 25, 2017 • edited Loading

Choose a reason for hiding this comment

chrisronline commented Jul 25, 2017

epixa commented Jul 25, 2017

spalger left a comment

Choose a reason for hiding this comment

epixa commented Jul 26, 2017

chrisronline commented Jul 26, 2017

epixa commented Jul 26, 2017

chrisronline commented Jul 26, 2017

chrisronline commented Jul 27, 2017

spalger left a comment

Choose a reason for hiding this comment

spalger Jul 28, 2017

Choose a reason for hiding this comment

chrisronline Jul 28, 2017

Choose a reason for hiding this comment

spalger Jul 28, 2017

Choose a reason for hiding this comment

chrisronline Jul 28, 2017

Choose a reason for hiding this comment

chrisronline commented Jul 28, 2017

spalger commented Jul 28, 2017

chrisronline commented Jul 28, 2017 • edited by epixa Loading

excalq commented Aug 8, 2017

epixa commented Aug 9, 2017

KeithTt commented May 23, 2018

epixa commented May 24, 2018

stacey-gammon commented Dec 3, 2018

ppf2 commented Dec 3, 2018

epixa commented Dec 3, 2018

ppf2 commented Dec 3, 2018

epixa commented Dec 3, 2018

ppf2 commented Dec 3, 2018

chrisronline commented Jul 24, 2017 •

edited

Loading

spalger commented Jul 25, 2017 •

edited

Loading

spalger Jul 25, 2017 •

edited

Loading

chrisronline commented Jul 28, 2017 •

edited by epixa

Loading