Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve highlighting by using highlight_query with all_fields enabled #9671

Merged
merged 10 commits into from
Feb 3, 2017

Conversation

lukasolson
Copy link
Member

@lukasolson lukasolson commented Dec 29, 2016

Fixes #2358.
Replaces #9091.

After some discussion, we decided that the way to have the most reliable and predictable highlighting experience would be to utilize the highlight_query option, and use the same query as the request body query, with the exception that all_fields is enabled for any query_string queries.

This also adds another advanced setting, doc_table:highlight:all_fields, which defaults to true. If set to false, then the highlighting won't use a separate highlight_query.

This should resolve all of the cases that we discussed in #9091.

Copy link
Contributor

@Bargs Bargs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting seems to work really well now! I tried out all the scenarios from the previous ticket and everything was great. I left a few thoughts about implementation inline.

Could you also update the docs to include the new advanced setting in this PR?

const boolQuery = { bool: { must: [
{ query_string: { query: 'foo' } },
{ range: { '@timestamp': { gte: 0, lte: 0 } } }
] } };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you format these so they're more readable?

if (_.has(clone, 'query_string')) {
clone.query_string.all_fields = true;
} else if (_.has(clone, 'bool.must')) {
clone.bool.must = clone.bool.must.map(getHighlightQuery);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about this being too brittle. I know we talked about only worrying about the query_string queries kibana creates, but maybe that's not such a great idea. If we change the structure of the query discover is generating (or ES changes the API) we might break highlighting without knowing about it. I also noticed another bug related to this assumption:

screen shot 2016-12-29 at 12 27 47 pm

The query entered into the query bar is valid, but the code here assumes bool.must is always an array.

What if we just recursively find any query_strings in the body and add all_fields to them, instead of making assumptions about the structure of the query body?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kinda tricky, because what if the user has a field named query_string and is using that field inside a query? Then we might be adding all_fields in a place where it's not even allowed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only we had some kind of intermediate representation to avoid such issues... :)

It's a good point, I'll try to think on it over the weekend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way to reliably do this without transforming the query into a normal form first either (which would be equivalent to an "intermediate representation"). ☹️

fields: {
'*': {}
},
fragment_size: FRAGMENT_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doc_table:highlight:all_fields is false, should we set require_field_match to false here to maintain the old behavior? Or do you think having a default_field is the only reason why someone might turn all_fields highlighting off, in which case they probably do want to require field matches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, I never saw this comment for some reason. You hit the nail on the head... I wouldn't expect someone to turn this off unless they had a default field set. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I also can't think of a reason to turn it off unless someone wants to search a specific field by default and in that case they probably only want highlighting on that specific field. I was probably worrying too much about preserving the old behavior :)

@@ -389,6 +391,8 @@ export default function SourceAbstractFactory(Private, Promise, PromiseEmitter)
recurse(agg.aggs || agg.aggregations);
});
}(flatState.body.aggs || flatState.body.aggregations));

highlightBody(flatState.body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that moving highlighting to _flatten ensures that it works across discover and dashboard. But it also means that even Vis requests ask for highlighting. I wonder if we should try to avoid this? Since the highlighting logic is nicely modularized now maybe we could just reuse it in both discover and dashboard?

screen shot 2016-12-29 at 12 56 24 pm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also just realized this would overwrite any highlighting defined via the SearchSource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely vote for not adding more magic to the data source. This piece of code has cost me so much time when writing the log context view before giving up on it. The more we add the more we restrict its reuse.

@tbragin
Copy link
Contributor

tbragin commented Jan 4, 2017

Awesome progress! Functionally, highlighting works as I expect now, based on examples here #9091 (comment)

The only weirdness I noticed was a refresh issue with whether the expanded document view for one of the documents in the preview was open during a search or not. If it was, there doesn't seem to be a refresh in terms of highlighting.

highlighting_sticky

@lukasolson
Copy link
Member Author

@tbragin Does that happen in master as well, or is it limited to this PR?

@tbragin
Copy link
Contributor

tbragin commented Jan 4, 2017

@lukasolson This issue seems to exists in master. Should I open a separate issue for it?

@tbragin
Copy link
Contributor

tbragin commented Jan 4, 2017

Went ahead and filed a separate issue #9729

@tbragin tbragin added the Feature:Highlight Highlighting of content via query label Jan 4, 2017
@lukasolson
Copy link
Member Author

Okay, this is ready for review again. A couple of notes:

  • I'm not completely satisfied with the current implementation in that highlightRequest makes a call to _flatten. This means that any time we want to use this, _flatten will be called multiple times (once for highlighting and once to flatten for the actual request that gets sent). I'm open to suggestions here.
  • I changed highlightTags from an Angular constant to a simple object export, and changed highlightFilter from an Angular filter to just a plain function (getHighlightHtml). It wasn't being used anywhere in HTML anyway. I also added an additional test because it seems this was a condition that wasn't being tested.
  • I updated the getHighlightQuery logic to fix the bug that @Bargs mentioned above (when must is not an array).

Could you take another look, @Bargs and @weltenwort?

Copy link
Contributor

@Bargs Bargs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely satisfied with the current implementation in that highlightRequest makes a call to _flatten. This means that any time we want to use this, _flatten will be called multiple times (once for highlighting and once to flatten for the actual request that gets sent). I'm open to suggestions here.

I tend to agree with you here. It's also a bit fragile since a change to the query without a subsequent call to searchSource.highlightRequest will leave the highlighting in a stale state. What if instead we had a flag called highlightAll or something, which a consumer of searchSource could toggle on, indicating that they want to opt in to all fields highlighting? Then at _flatten time we check that flag and if it's set, generate the highlighting config, similar to how you had it before?

Also, have you looked into updating Dashboard to use this functionality in addition to Discover? At the moment highlighting doesn't work on terms entered into the Dashboard's query bar:

screen shot 2017-01-12 at 5 27 37 pm

And one last thing: we'll still need to update the documentation with the new advanced setting option.

I haven't gone through the highlight module changes with a fine-tooth comb yet, but I wanted to leave these few high level comments before digging too deep.


export default function getHighlightHtml(fieldValue, highlights) {
let highlightHtml = (typeof fieldValue === 'object')
? angular.toJson(fieldValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be JSON.stringify? If we remove this one dependency on angular, the tests can be run standalone without Karma.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember exactly why this was put in here, but if I remember correctly, fieldValue could be an Angular object (with $$-prefixed values) and that's why we used angular.toJson.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, not a big deal

@lukasolson
Copy link
Member Author

@Bargs This is ready for another review. I didn't explicitly handle the dashboard case, but it looks to me like it's working there.

Copy link
Contributor

@Bargs Bargs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw one minor doc typo, otherwise LGTM!

@@ -35,6 +35,8 @@ document.
`discover:sampleSize`:: The number of rows to show in the Discover table.
`doc_table:highlight`:: Highlight results in Discover and Saved Searches Dashboard. Highlighting makes request slow when
working on big documents. Set this property to `false` to disable highlighting.
`doc_table:highlight:all_fields`:: Improves highlighting by using a separate `highlight_query` that uses `all_fields` mode on
`query_string` queries. Set to `false` if you are using a `default_field" in your index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the double quote after default_field needs to be a backtick?

@lukasolson
Copy link
Member Author

Should this only apply to 5.x, seeing as how in 6.x the default will be to disable _all?

@Bargs
Copy link
Contributor

Bargs commented Jan 30, 2017

I think old indices will still have _all though right? I don't think _all will be completely removed until 7.0.

@@ -178,6 +181,11 @@ export default function SearchSourceFactory(Promise, Private, config) {
});
};

SearchSource.prototype.highlightRequest = function highlightRequest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here, but where is this ever called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good point, that should have been removed.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the left-overs it LGTM as far as the code goes. But I don't have full confidence in my understanding of the way this changes highlighting for the user. I would defer judgement to someone else in that regard.


export default function SearchSourceFactory(Promise, Private, config) {
const SourceAbstract = Private(AbstractDataSourceProvider);
const SearchRequest = Private(SearchRequestProvider);
const SegmentedRequest = Private(SegmentedRequestProvider);
const searchStrategy = Private(SearchStrategyProvider);
const normalizeSortRequest = Private(NormalizeSortRequestProvider);
const getHighlightRequest = getHighlightRequestProvider(config);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might also be a left-over that can be removed.

@@ -58,13 +58,15 @@ import AbstractDataSourceProvider from './_abstract';
import SearchRequestProvider from '../fetch/request/search';
import SegmentedRequestProvider from '../fetch/request/segmented';
import SearchStrategyProvider from '../fetch/strategy/search';
import { getHighlightRequestProvider } from '../../highlight';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might also be a left-over that can be removed.

@lukasolson lukasolson merged commit 909b8c7 into elastic:master Feb 3, 2017
@Bargs
Copy link
Contributor

Bargs commented Feb 3, 2017

🎆 🎉 🎈 !

lukasolson added a commit to lukasolson/kibana that referenced this pull request Feb 3, 2017
…elastic#9671)

* Add all_fields to highlight query to improve highlighting

* Refactor highlighting and move out of _flatten

* Make changes as per @Bargs' requests

* Add documentation about highlightAll setting

* Fix docs typo

* Remove unused function

* Remove unused code
@lukasolson
Copy link
Member Author

lukasolson commented Feb 3, 2017

Backported to 5.x (5.3) in 18451e5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Highlight Highlighting of content via query review v5.3.0 v6.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Over-highlighting in Discover
4 participants