Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure conflicted fields can be searchable and/or aggregatable #13070

Merged

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Jul 24, 2017

Resolves #12728

Summary

This PR ensures that fields that have type conflicts will no longer be assumed as non aggregatable or non searchable within Kibana.

With this change, there is a possibility of shard failures in ES. Due to a recent change in ES, users with more than 128 shards will not see a shard failure error because of the type conflict. However, users with less than 128 shards will need to update their mappings and reindex their data to prevent related shard failures.

Testing

  1. Create two related indices with the same field name but use different related types (such as text and keyword or long and integer)
  2. Add data to each index.
  3. Create a kibana index pattern that includes both indices.
  4. Notice that the mappings do not report a conflict and the fields are searchable and aggregetable
  5. Navigate to Discover and verify you can filter off the fields
  6. Navigate to Visualize and build a visualization using the fields

Note: If you already have an environment setup with conflicted fields, simply refresh the field list for the kibana index pattern instead.

Testing Screenshots

screen shot 2017-07-25 at 2 32 35 pm

My environment is less than 128 shards so I will see a shard failure:
screen shot 2017-07-24 at 2 38 40 pm

Sample Data

PUT foo
{
  "mappings": {
    "myObj": {
      "properties": {
        "name": {
          "type": "text"
        },
        "time": {
          "type": "date"
        },
        "count": {
          "type": "integer"
        },
        "type_conflict": {
          "type": "integer"
        }
      }
    }
  }
}

PUT foo/myObj/1
{
  "id": 1,
  "name": "Jim",
  "time": "2016-07-18T14:00:48.365Z",
  "count": 1,
  "type_conflict": 1
}

PUT foo/myObj/2
{
  "id": 2,
  "name": "Halpert",
  "time": "2016-07-18T14:00:48.365Z",
  "count": 3,
  "type_conflict": 2
}

PUT foo2
{
  "mappings": {
    "myObj": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "time": {
          "type": "date"
        },
        "count": {
          "type": "integer",
          "doc_values": false
        },
        "type_conflict": {
          "type": "text"
        }
      }
    }
  }
}

PUT foo2/myObj/1
{
  "id": 1,
  "name": "Michael",
  "time": "2017-07-18T14:00:48.365Z",
  "count": 1,
  "type_conflict": "1"
}

PUT foo2/myObj/2
{
  "id": 2,
  "name": "Scott",
  "time": "2017-07-18T14:00:48.365Z",
  "count": 3,
  "type_conflict": "2"
}

@alexfrancoeur
Copy link

Nice @chrisronline! Is this something we can backport to 5.5 as well? I'm worried that we'll see more issues like this as 5.5 gains greater adoption

@chrisronline
Copy link
Contributor Author

@alexfrancoeur I don't see a reason why not, but I'd defer to @epixa's judgement

@epixa
Copy link
Contributor

epixa commented Jul 25, 2017

Yeah, we'll want to fix this in 5.5 as well

@@ -67,8 +67,8 @@ export function readFieldCapsResponse(fieldCapsResponse) {
return {
name: fieldName,
type: 'conflict',
searchable: false,
aggregatable: false,
searchable: types.reduce((acc, esType) => (acc || capsByType[esType].searchable), false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be cleaner if this used Array#some() rather than reduce.

types.some(type => !!capsByType[type].searchable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

@chrisronline
Copy link
Contributor Author

@spalger Great suggestion. Updated

@spalger
Copy link
Contributor

spalger commented Jul 25, 2017

Do we know which aggregations supported conflict fields in 5.x? I'm trying to do an aggregations on a field that is long in one index and integer in another, since the field is conflict types I'm not able to select it in the histogram field chooser.

@chrisronline
Copy link
Contributor Author

Per a discussion with @spalger, we've decided to make some changes here. Instead of trying multiple types as a conflict in the field caps response, we are now resolving all multiple types into kibana types and if they all resolve into the same kibana type, we're no longer considering that a conflict.

For example, text and keyword both resolve to string within Kibana so there is no longer a conflict. integer and long both resolve to number within Kibana so there is no longer a conflict.

Because this will solve the immediate use case, we can revert all the UI changes since those fields will not be reported as conflicted. In the case where there are still conflicted fields (where the types simply are different), the behavior will still exist where users are unable to select that field for filtering or aggregating.

++ to @spalger for the suggestion!

@epixa
Copy link
Contributor

epixa commented Jul 25, 2017

@chrisronline Can you elaborate on why this is the better approach to simply leaning on field_caps for these values?

@@ -63,27 +64,45 @@ export function readFieldCapsResponse(fieldCapsResponse) {
const capsByType = capsByNameThenType[fieldName];
const types = Object.keys(capsByType);

// If there are multiple types but they all resolve to the same kibana type
// ignore the conflict and carry on (my wayward son)
if (types.length > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to check the length of types here, just get the uniqueKibanaTypes and check that.

@spalger
Copy link
Contributor

spalger commented Jul 25, 2017

@epixa I wouldn't say this is better, but it is better replicating the behavior of 5.x

if (uniqueKibanaTypes.length > 1) {
// If a single type is marked as searchable or aggregatable, all the types are searcuable or aggregatable
const conflictIsSearchable = types.some(type => {
return !!capsByType[type].searchable ||
Copy link
Contributor

@spalger spalger Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic would probably make a nice little helper function that could be reused below too. Maybe something like hasCapability(capsByType[type], 'searchable')?

@chrisronline
Copy link
Contributor Author

@epixa

Based on the test data listed in the description, consider the responses from field_stats and field_caps:

GET foo*/_field_stats/?fields=name'
...
"fields": {
  "name": {
    "type": "string",
    "max_doc": 4,
    "doc_count": 4,
    "density": 100,
    "sum_doc_freq": 4,
    "sum_total_term_freq": -1,
    "searchable": true,
    "aggregatable": true,
    "min_value": "Michael",
    "max_value": "jim"
  }
}
...

The name field has two different types in ES: text and keyword. But the response doesn't reflect that. It resolves them both to type string.

GET foo*/_field_caps/?fields=name

{
  "fields": {
    "name": {
      "text": {
        "type": "text",
        "searchable": true,
        "aggregatable": false,
        "indices": [
          "foo"
        ]
      },
      "keyword": {
        "type": "keyword",
        "searchable": true,
        "aggregatable": true,
        "indices": [
          "foo2"
        ]
      }
    }
  }
}

With field_caps, it doesn't resolve them to a single type but returns both types. However, if we converted both text and keyword to a kibana type, we'd get string.

Per @spalger, we want to achieve as much of 5.x behavior as possible and this smaller code change will still get us there.

@epixa
Copy link
Contributor

epixa commented Jul 25, 2017

Thanks for the summary

Copy link
Contributor

@spalger spalger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@epixa
Copy link
Contributor

epixa commented Jul 26, 2017

Can you merge in the latest master to get CI running on a working commit?

@chrisronline chrisronline force-pushed the fix/searchable_aggregatable_conflicts branch from a4eb414 to c11457e Compare July 26, 2017 19:01
@chrisronline
Copy link
Contributor Author

@epixa Sure! Just rebased and CI is building now

@epixa
Copy link
Contributor

epixa commented Jul 26, 2017

Can we get a test for these changes? This seems like a pretty critical behavior for how we handle fields in Kibana, so a regression here will break a lot of stuff (as evidence by our regression here breaking a lot of stuff).

@chrisronline
Copy link
Contributor Author

@epixa Definitely. I somehow missed the existing test file when looking the first time, but I updated it to include tests for the newly added logic.

@chrisronline chrisronline force-pushed the fix/searchable_aggregatable_conflicts branch from c0063f0 to 7048fce Compare July 27, 2017 19:36
@chrisronline
Copy link
Contributor Author

@spalger Added the functional test. Lemme know if that looks right to you.

Copy link
Contributor

@spalger spalger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor things, once they're fixed I say merge this sucker!

]
const dateField = resp.body.fields.find(f => f.type === 'date');
const successField = resp.body.fields.find(f => f.type === 'conflict');
expect(dateField).to.eql({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests intentionally test the entire response body. If it changes at all these tests should fail. Mind not filtering but asserting that the entire body matches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated

"settings": {
"index": {
"number_of_shards": "1",
"number_of_shards": "5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason for 5 shards, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, no. I didn't manually change that so I'm not sure what happened, but reverted back to 1

@chrisronline
Copy link
Contributor Author

@spalger Updated. Ready for another pass!

@spalger
Copy link
Contributor

spalger commented Jul 28, 2017

LGTM

@chrisronline chrisronline merged commit 389115c into elastic:master Jul 28, 2017
chrisronline added a commit that referenced this pull request Jul 28, 2017
* Ensure that conflict fields can be searchable and/or aggregatable in the UI

* Use `some` instead of `reduce`

* Revert UI changes

* Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict

* Add comma back

* Cleaner code

* Add tests

* Update failing test to handle searchable and aggregatable properly

* Add functional test to ensure similar ES types are properly merged

* Update tests

* Revert shard size
chrisronline added a commit that referenced this pull request Jul 28, 2017
* Ensure that conflict fields can be searchable and/or aggregatable in the UI

* Use `some` instead of `reduce`

* Revert UI changes

* Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict

* Add comma back

* Cleaner code

* Add tests

* Update failing test to handle searchable and aggregatable properly

* Add functional test to ensure similar ES types are properly merged

* Update tests

* Revert shard size
chrisronline added a commit that referenced this pull request Jul 28, 2017
* Ensure that conflict fields can be searchable and/or aggregatable in the UI

* Use `some` instead of `reduce`

* Revert UI changes

* Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict

* Add comma back

* Cleaner code

* Add tests

* Update failing test to handle searchable and aggregatable properly

* Add functional test to ensure similar ES types are properly merged

* Update tests

* Revert shard size
chrisronline added a commit that referenced this pull request Jul 28, 2017
* Ensure that conflict fields can be searchable and/or aggregatable in the UI

* Use `some` instead of `reduce`

* Revert UI changes

* Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict

* Add comma back

* Cleaner code

* Add tests

* Update failing test to handle searchable and aggregatable properly

* Add functional test to ensure similar ES types are properly merged

* Update tests

* Revert shard size
@chrisronline
Copy link
Contributor Author

chrisronline commented Jul 28, 2017

Backport:

6.x: b8edbc2
6.0: d96927e
5.5: b0f1383
5.6: fd560bc

chrisronline added a commit that referenced this pull request Jul 28, 2017
* Ensure that conflict fields can be searchable and/or aggregatable in the UI

* Use `some` instead of `reduce`

* Revert UI changes

* Attempt to convert multiple ES types to kibana types, and if they all resolve to the same kibana type, there is no conflict

* Add comma back

* Cleaner code

* Add tests

* Update failing test to handle searchable and aggregatable properly

* Add functional test to ensure similar ES types are properly merged

* Update tests

* Revert shard size
@excalq
Copy link

excalq commented Aug 8, 2017

This regression breaks a lot of our visualizations as well. Spending [more] hours to reindex is looking to be a lost cause, as it's a very slow process. Very happy to hear that the string->text/keyword problem is worked in newer versions of Kibana. Any indication when 5.5.2 might be released as a package? The snapshot release link is 404, sadly.

@epixa
Copy link
Contributor

epixa commented Aug 9, 2017

@excalq We're working on shoring up/QAing the 5.5.2 release now, so hopefully it'll go out in the next week or two.

@KeithTt
Copy link

KeithTt commented May 23, 2018

I have met this issue, my elastic version is 5.5.1.

What version fixed this issue?

@epixa
Copy link
Contributor

epixa commented May 24, 2018

@KeithTt 5.5.2

@stacey-gammon
Copy link
Contributor

@epixa, @spalger, @chrisronline - @AlonaNadler and I are chatting about the impending ECS changes and there is concern that this will cause a lot more of these index mapping conflicts. It appears that this PR didn't actually fix the problem, since we have the open issues referenced above. Is this correct? The issue still exists? Disclaimer - I did not read through all the comments, but judging by the last two, seemed like it should be fixed. Thanks for any extra insight!

@ppf2
Copy link
Member

ppf2 commented Dec 3, 2018

Looking at the testing description, it looks like it is focused on concrete data type differences (i.e., text vs. keyword, keyword vs. integer, integer vs. long, etc..)? What about the scenario if one index has "host" as a keyword field vs. another index where "host" represents as an object (i.e. host.X, etc..)?

image

@epixa
Copy link
Contributor

epixa commented Dec 3, 2018

@stacey-gammon The underlying issue here is that Elasticsearch (by design) does not support searching across types without causing shard errors. There might be exceptions to that that I'm not familiar with, but on the whole that's how it is. This particular PR was removing an artificial limitation in Kibana that disallowed even attempting to search/aggregate on a field that we identified as having conflicts, so the end result would be that you can attempt to aggregate on the field anyway and if ES returns shard failures, we'll show them as errors.

For large data sets, the shard filter phase in Elasticsearch will kick in and should eliminate the shard errors in the most common cases because it will limit the amount of shards that actually get hit to serve the request, so it probably won't touch older indices that have different mappings. The downside to this approach is that when you do search across mapping boundaries, you get shard errors.

Small data sets might be under the 128 shard default threshold for the shard filter phase, they might need to reindex. At this scale, reindexing is realistic, so I'm personally pretty comfortable with the ES team's recommendation here.

So the big issue is on larger data sets when searching across mapping boundaries, and I'm not sure how we could fix that reliably. So far every time we've attempted to add conveniences in Kibana to diverge from Elasticsearch's behaviors in order to improve the search experience, we've shot ourselves in the foot and made things worse. The behavior this PR undoes is such an example where we tried to make the UX nicer for folks and ended up hurting people.

@ppf2
Copy link
Member

ppf2 commented Dec 3, 2018

This particular PR was removing an artificial limitation in Kibana that disallowed even attempting to search/aggregate on a field that we identified as having conflicts, so the end result would be that you can attempt to aggregate on the field anyway and if ES returns shard failures, we'll show them as errors.

@epixa

It seems like the PR is not complete; specifically, it doesn't appear to handle the concrete vs. object type scenario. Take this Elasticsearch example:

Say there are 2 instances:
In test-2018.12.02, "host" is defined as a keyword field.
In test-2018.12.03, "host" is an object field with name as a string field as a subfield.

By the way, the above example is not arbitrary, this is one possibility of Beat's breaking schema change in 6.3+:

PUT test-2018.12.02
{
  "mappings": {
    "type": {
      "properties": {
        "host": {
          "type": "keyword"
        }
      }
    }
  }
}
PUT test-2018.12.03
{
  "mappings": {
    "type": {
      "properties": {
        "host": {
          "properties":{
            "name":{
              "type":"keyword"
            }
          }
        }
      }
    }
  }
}

If an aggregation is requested against test-* and "host" field, ES will still return aggregation results without errors:

GET test-*/_search
{
  "aggs": {
    "NAME": {
      "terms": {
        "field": "host",
        "size": 10
      }
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-2018.12.02",
        "_type" : "type",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-12-02T01:00:00Z",
          "host" : "firestorm"
        }
      },
      {
        "_index" : "test-2018.12.03",
        "_type" : "type",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2018-12-03T01:00:00Z",
          "host" : {
            "name" : "firestorm2"
          }
        }
      }
    ]
  },
  "aggregations" : {
    "NAME" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "firestorm",
          "doc_count" : 1
        }
      ]
    }
  }
}

Except that it will only include the aggregation results obtained from test-2018.12.02 because it is the one that has the concrete "host" field.

In Kibana, when it detects a difference in the field type across the 2 indices, it will simply block "host" from being used in any aggregations.

image

Even though field caps indicate that the "host" field for test-2018.12.02 is actually aggregatable (as well as host.name for test-2018.12.03):

{
  "fields" : {
    "host" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true,
        "indices" : [
          "test-2018.12.02"
        ]
      },
      "object" : {
        "type" : "object",
        "searchable" : false,
        "aggregatable" : false,
        "indices" : [
          "test-2018.12.03"
        ]
      }
    },
    "host.name" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    }
  }
}

It seems like for consistency purposes, Kibana can relax the conflict checking here and also allow "host" field to be aggregatable. This way, Kibana can just let ES decide what aggregation results will actually be returned if the users' date range cover both forms of the indices v.s., today where it completely prevents users from being able to use the host field that is present in older indices.

@epixa
Copy link
Contributor

epixa commented Dec 3, 2018

@ppf2 Can you open a new issue so the apps team can review? This issue is ancient history.

@ppf2
Copy link
Member

ppf2 commented Dec 3, 2018

Thanks @epixa . Filed: #26583

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants