Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] [Alerting] "Group by" functionality #68250

Merged
merged 35 commits into from
Jun 30, 2020

Conversation

Kerry350
Copy link
Contributor

@Kerry350 Kerry350 commented Jun 4, 2020

Summary

This PR implements #67465. (UI and server side executor logic).

The design differs from the orignal ticket. It was agreed after a discussion that we should make the group by a part of the expression like everything else.

This expression piece has been implemented so it can be used within Metrics too. Currently this is in components/alerting/shared/group_by_expression, but I'm completely open to moving this if we're unhappy.

Note on tests: I've amended the current executor tests so that a green build is possible. However, now that there's an ungrouped logic fork and a group by logic fork, I'd like to A) quite heavily amend the test logic and B) add group by coverage. I'd like to do this in a followup PR directly after this work. < #69261

Functionality overview

Overall view of the group by expression:

Screenshot 2020-06-04 14 23 03

Server logs for grouped and ungrouped alerts:

Screenshot 2020-06-04 14 36 21

Ungrouped (closed):

Screenshot 2020-06-04 14 36 33

Ungrouped (open):

Screenshot 2020-06-04 14 36 44

Grouped (open):

Screenshot 2020-06-04 14 37 09

Action message with new context.group:

Screenshot 2020-06-04 14 37 14

group_by

Known issues

The combo box list is currently rendering under the popover due to a z-index issue, however this isn't easy to fix. This particular combination of flyout, popover, and combobox causes problems. I'm talking with the EUI team about this, but it shouldn't block review. < Fixed in elastic/eui#3551

Testing

SSL needs to be enabled: https://github.com/elastic/observability-dev/issues/849#issuecomment-626770782

  • Does creating an alert through the logs app work with a group by?
  • Does creating an alert through the logs app work without a group by?
  • Does creating an alert through the alerts management page work with a group by?
  • Does creating an alert through the alerts management page work without a group by?
  • Does editing an alert through the alerts management page work with a group by?
  • Does editing an alert through the alerts management page work without a group by?
  • General usage

@Kerry350 Kerry350 requested a review from a team as a code owner June 4, 2020 14:01
@Kerry350 Kerry350 marked this pull request as draft June 4, 2020 14:01
@Kerry350 Kerry350 self-assigned this Jun 4, 2020
@Kerry350 Kerry350 added Feature:Alerting Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0 v8.0.0 labels Jun 4, 2020
@Kerry350 Kerry350 added this to the Logs UI 7.9 milestone Jun 4, 2020
@katrin-freihofner
Copy link
Contributor

🙌this looks great! Special thanks for the great documentation in the PR itself.

@Kerry350 Kerry350 marked this pull request as ready for review June 5, 2020 15:29
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 Reviewing on behalf of @elastic/logs-metrics-ui...

@weltenwort
Copy link
Member

@elasticmachine merge upstream

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first batch of comments - I'm almost done but will have to defer the rest until tomorrow. It's pretty readable so far 👏

x-pack/plugins/infra/common/alerting/logs/types.ts Outdated Show resolved Hide resolved
x-pack/plugins/infra/common/alerting/logs/types.ts Outdated Show resolved Hide resolved
size: 20,
sources: groupBy.map((field, index) => ({
[`group-${index}-${field}`]: {
terms: { field },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently documents that lack any of the grouping fields are not alerted on, i.e. there's no "everything else" bucket. Is that intentionally left out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It was never explicitly discussed, however, so it is just a decision I've made. In my opinion if I've requested to group by host.name as an example, due to a big fleet of infrastructure, I don't think knowing something matched on "everything else" is helpful. However, I appreciate there's many fields, and many uses for grouping. Setting the group bys is an explicit action though, it would seem odd to me to get an alert on other things. Willing to throw this out to a wider audience for opinons though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it really depends on the use case. Maybe @mukeshelastic can weigh in on that?

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work in most cases, but I left some questions about edge cases below. They might just be due to a lack of context on my side, though. 😇

@weltenwort
Copy link
Member

Another question that came to mind: Do you know why we use all of must, must_not and filter in the bool clause of the alert query? Since we're not interested in the scores just using must_not and filter should improve caching and thereby performance for these regularly-repeated queries.

@Kerry350
Copy link
Contributor Author

Another question that came to mind: Do you know why we use all of must, must_not and filter in the bool clause of the alert query? Since we're not interested in the scores just using must_not and filter should improve caching and thereby performance for these regularly-repeated queries.

Just poor querying on my part. I didn't know a performance hit would be incurred for using must as well.

@Kerry350
Copy link
Contributor Author

Kerry350 commented Jun 16, 2020

Yes, one way in which we could further soften the expectation without completely changing the way we query the data would be to use different (larger) time range in which we evaluate the existence of a group.

👍

Alternatively the groups could be captured statically at alert creation time so there's no risk of losing them, but that almost sounds like a completely different definition of "grouping".

Personally I'd strongly argue against this. With alerting I feel there's a good expectation that alerts are a reflection of execution time, and not "stale" data. Also (and I may well be missing something) I don't see how this would help — even if we captured the groups at creation time, later when the executor runs if there were "no documents" we'd still be in the same situation trying to ascertain which groups produced the "no documents", i.e. we wouldn't be able to link the statically captured groups to the runtime documents, as the "group by" fields would still be missing.

⚠️ I think this is good to go now, in terms of ES query amendments this is a summary of the changes:

  • I've refactored the code a little so that grouped and ungrouped ES queries are now constructued in separate functions.

  • For grouped scenarios the composite aggregation happens top-level, with the filtering as a nested sub-aggregation.

  • For grouped scenarios the range receives a 20% (no science here, just tried to use something that wouldn't be so large it produced vastly different behaviour, but enough to potentially capture groups that slip into a no document state) padding (of the overall interval) on each side (gte and lte). This allows us to cast a slightly wider net for the groups themselves, incase the inner filtering "loses" the groups.

  • There is now much more aggressive runtime type validation via io-ts. The alert params are validated. And so are ES search responses.

Edit: Forgot to push one change, now should be good.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring to make it even more readable - I think it's almost good to go 👏 And the type guards seem like a good thing too 👍

Your solution to the "disappearing groups" situation makes sense to me. Now it'll likely alert once when a group disappears. That's probably good enough for most cases. And as you said, the general "no data" scenario should probably be handled separately anyway.

One of my comments from last time about the fields still being limited to string seems to have fallen through the cracks. 😉

const groupByFields = useMemo(() => {
if (sourceStatus?.logIndexFields) {
return sourceStatus.logIndexFields.filter((field) => {
return field.type === 'string' && field.searchable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea about this? 😇

Kerry350 and others added 3 commits June 29, 2020 10:47
…eshold_executor.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…eshold_executor.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
@Kerry350
Copy link
Contributor Author

Kerry350 commented Jun 29, 2020

@weltenwort

Any idea about this? 😇

It won't let me reply directly to this inline for some reason, so I'll do it here.

Sorry, I wasn't ignoring this, I just didn't have a good answer at the time and needed to go away and actually find out why 😬

So, right back at the beginning this filtering criteria was copied from metrics. And I didn't investigate it.

I wasn't easily able to find out precisely what dictated a field being searchable (maybe there' some docs I just missed), but I did find a Slack conversation that explained it.

A field is deemed as searchable when the index field mapping is set to true, and is therefore "queryable". Discover only reveals fields that aren't searchable when an option is toggled for "hidden" fields.

So I guess the question is, do we want people to be able to select fields that aren't set to be indexed and queryable? (Naively I'd say no?).

(I also changed the group by fields to allow only aggregatable fields).

@weltenwort
Copy link
Member

So I guess the question is, do we want people to be able to select fields that aren't set to be indexed and queryable? (Naively I'd say no?).

(I also changed the group by fields to allow only aggregatable fields).

My understanding is that searchable indicates that an inverted index exists, so a lookup from term to documents that contains it is possible. aggregatable indicates that "doc values" exists such that an efficient lookup of terms that a field in a document contains is possible.

For the purposes of grouping we only require it to be aggregatable, I think.

@Kerry350
Copy link
Contributor Author

My understanding is that searchable indicates that an inverted index exists, so a lookup from term to documents that contains it is possible. aggregatable indicates that "doc values" exists such that an efficient lookup of terms that a field in a document contains is possible.

Ah, okay, thanks for the explanation 👍 I'm struggling to find where this information exists (I trust you, you have way more experience with this, I just don't know where I'd point someone to find out about this).

I've tweaked the filtering now, so everything should be aligned with what's necessary.

This is all ready for another look over, thanks for the review patience 🙏

@weltenwort
Copy link
Member

weltenwort commented Jun 29, 2020

I'm struggling to find where this information exists

I was unable to find direct documentation of it either. The best I could find are the comments in elastic/elasticsearch#23007 and the ES source:

    /**
     * Returns true if the field is searchable.
     */
    public boolean isSearchable() {
        return isIndexed;
    }

    /** Returns true if the field is aggregatable. [sic]
     *
     */
    public boolean isAggregatable() {
        try {
            fielddataBuilder("");
            return true;
        } catch (IllegalArgumentException e) {
            return false;
        }
    }

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't come across any unexpected behavior anymore and the code looks better to me than before this PR. Good job 👏

(one final attempt to rectify my confusion in the comments)

…n_editor/editor.tsx

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
infra 607 +1 606

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Kerry350 Kerry350 merged commit ceb8595 into elastic:master Jun 30, 2020
Kerry350 added a commit to Kerry350/kibana that referenced this pull request Jun 30, 2020
- Add "group by" functionality to logs alerts
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jun 30, 2020
…ata-streams

* 'master' of github.com:elastic/kibana: (50 commits)
  [Logs UI] [Alerting] "Group by" functionality (elastic#68250)
  [Discover] Deangularize Skip to bottom button (elastic#69811)
  Implement recursive plugin discovery (elastic#68811)
  Use ts-expect-error in platform code (elastic#69883)
  [SIEM][Detection Engine][Lists] Moves getQueryFilter to common folder for use by both front and backend
  [Ingest Manager][SECURITY SOLUTION] adjust config reassign link and add roundtrip to Reassignment flow (elastic#70208)
  [Security][Lists] Add API functions and react hooks for value list APIs (elastic#69603)
  [ILM] Fix bug when clearing priority field (elastic#70154)
  [Platform][Security] Updates cluster_manager ignorePaths to include security scripts (elastic#70139)
  [IngestManager] Allow to filter agent by packages (elastic#69731)
  [code coverage] exclude folders: test_helpers, tests_bundle (elastic#70199)
  [Metrics UI] UX improvements for saved views (elastic#69910)
  [APM] docs: unique transaction troubleshooting (elastic#69831)
  Cross cluster search functional test with minimun privileges assigned to the test_user (elastic#70007)
  [Maps] choropleth layer wizard (elastic#69699)
  Make custom errors by extending Error (elastic#69966)
  [Ingest Manager] Support updated package output structure (elastic#69864)
  Resolver test coverage (elastic#70246)
  Async Discover search test (elastic#64388)
  [ui-shared-deps] include styled-components (elastic#69322)
  ...

# Conflicts:
#	x-pack/plugins/snapshot_restore/server/types.ts
Kerry350 added a commit that referenced this pull request Jun 30, 2020
- Add "group by" functionality to logs alerts
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 1, 2020
* master:
  [ML] Modifies page title to Create job (elastic#70191)
  [APM] Add API test for service maps (elastic#70185)
  [DOCS] Adds glossary to documentation (elastic#69721)
  [Usage Collection] Report nodes feature usage (elastic#70108)
  chore: improve support for mjs file extension (elastic#70186)
  [ML] Anomaly Detection: ensure  'Category examples' tab in the expanded table row can be seen (elastic#70241)
  [Maps] Add maps telemetry saved object in with mappings disabled (elastic#69995)
  Fix typo in bootstrap command (elastic#69976)
  [code coverage] ingest correct coveredFilePath for mocha (elastic#70215)
  [Dashboard] Add visualization by value to dashboard (elastic#69898)
  updates wording in Cases connectors (elastic#70298)
  [ML] Fix license subscription race condition. (elastic#70074)
  [Logs UI] [Alerting] "Group by" functionality (elastic#68250)
Bamieh pushed a commit to Bamieh/kibana that referenced this pull request Jul 1, 2020
- Add "group by" functionality to logs alerts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants