Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Leverage original http request error #79831

Merged
merged 8 commits into from
Oct 13, 2020

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Oct 7, 2020

Fixes #77360

This PR makes sure the underlying http request error is propagated to the expression error handler and uses the underlying ES information to show a more meaningful error message:
Screenshot 2020-10-07 at 12 05 20

Full list of changes:

  • src/plugins/expressions/common/util/create_error.ts If an error is thrown in a sub expression (as it is the case for Lens), the original error was not propagated correctly but discarded because an ExpressionValueError['error'] is not a real JS Error. This makes sure to always keep the underlying real error object even if it's wrapped multiple times.
  • src/plugins/expressions/public/react_expression_renderer.tsx: Only the error message was passed to the expression renderer react component callback - this PR adds a second parameter for the whole error
  • src/plugins/expressions/public/types/index.ts: Add original to the type of this error - not sure about the type structure here, but it definitely ends up being defined at that level as well
  • In Lens: Add a helper for pulling the interesting pieces out of the original error and use it in the workspace panel and in the embeddable.

Testing

  • Create a Lens dashboard based on an index
  • Create a user without read rights on that index
  • Access the dashboard logged in as that user
  • Error messages should be helpful

@flash1293 flash1293 marked this pull request as ready for review October 7, 2020 13:29
@flash1293 flash1293 requested a review from a team October 7, 2020 13:29
@flash1293 flash1293 requested a review from a team as a code owner October 7, 2020 13:29
@flash1293 flash1293 requested a review from lizozom October 7, 2020 13:29
original:
err instanceof Error
? err
: typeof err === 'object' && 'original' in err && err.original instanceof Error
Copy link
Contributor

@dej611 dej611 Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could leverage some knowledge from TS here and just do

...
: err.original instanceof Error ? err.original : undefined

Copy link
Contributor Author

@flash1293 flash1293 Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is necessary because err could be a string based on the type, so we have to do some generic checks first, otherwise typescript will complain about accessing original on a union type that might not have the property

@botelastic botelastic bot added the Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) label Oct 7, 2020
Copy link
Contributor

@lizozom lizozom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks good (haven't tested, but it makes sense).

I do wonder if we could find a uniform way to display errors on panels in both Dashboard and Lens (see EmbeddableErrorLabel)

Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working well for the permissions use case, but the main comment I'm leaving is that I want to validate the error response handling a little bit more. It's very difficult to find documentation of all the error cases, but I have to assume that there are people at the company who have dealt with this before.

defaultMessage: 'Request error: {causedByType}, {causedByReason}',
values: {
causedByType: error.original.body?.attributes?.error?.caused_by.type,
causedByReason: error.original.body?.attributes?.error?.caused_by.reason,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic you're using here looks incomplete, so I tried to break it. There appear to be different error responses for different types of failures, and I think you've only tested permission failures. I don't feel comfortable with the sort of one-off error response handling that you're doing here unless we can validate it against some kind of reference error handling implementation. Maybe the ES Clients team can take a look at this?

For example, sometimes there is a caused_by.caused_by response from Elasticsearch when Elasticsearch fails to validate the request itself, such as when you use a terms agg on a text field. For example, try running this request in the dev tools:

GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "t": {
      "terms": {
        "field": "host"
      }
    }
  }
}

Another example of a response that you're missing a case for is when the script is failing. This can happen fairly often with an example like metricbeat:

GET metricbeat-*/_search
{
  "size": 0,
  "aggs": {
    "d": {
      "max": {
        "script": """doc['system.network.in.bytes'].value"""
      }
    }
  }
}

Copy link
Contributor Author

@flash1293 flash1293 Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the same logic as this piece of code from core:

I totally see your point, but I don’t think we have a “handleAllCommonEsErrors” helper (yet). What do you think about adding nice handling for the cases you mention above (great findings btw), and if none of these apply falling back to the default behavior of unknown errors (which is showing the message of the thrown exception and also what’s happening on master right now) - at least for 7.10. We should continue to investigate to find a holistic solution and ideally roll it out everywhere (Discover, Visualize etc - all of them produce really bad error messages depending on the case)

I’m also more than happy to loop in someone knowledgeable on the topic to improve the PR beyond what’s mentioned above (do you know who would be a good fit?), my only ask is to not increase scope too much for an improvement of the current state of 7.10 (that’s also why I didn’t touch core code itself to refactor out a generic helper, it felt like opening a can of worms too big for the scope of the intended fix).

@flash1293
Copy link
Contributor Author

@wylieconlon I changed the lookup to recursively check the caused_bys in the Elasticsearch error message. This is how some error messages look like with this fix:

(Missing permissions to access index and aggregating on a text field (with updated index pattern)
Screenshot 2020-10-12 at 14 27 45

(Error in painless script)
Screenshot 2020-10-12 at 14 28 02

(Aggregating on text field without the index pattern knowing about it)
Screenshot 2020-10-12 at 14 41 47

Those are far from perfect, but IMHO they get the point across and are at least "searchable" (Google will show up help for the individual problems)

Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test again, but the error screenshots you showed look much better than before

if (causedBy) {
return getNestedErrorClause(causedBy);
}
return { type, reason };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be skipping all the intermediate types and reasons? This recursion makes sense, but are there cases where it's losing important information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point - for the known cases the root exception is the relevant one and I made this experience very often in Java in general (lot's of re-thrown exceptions, but the actual cause is in the very last one). I guess when we solve this cleanly there should be an option to expand the error to show all the details. I will open an issue for this.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id before after diff
lens 563 564 +1

async chunks size

id before after diff
lens 1.0MB 1.0MB +1.8KB

page load bundle size

id before after diff
expressions 198.4KB 198.5KB +96.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@flash1293 flash1293 merged commit cd84ace into elastic:master Oct 13, 2020
flash1293 added a commit to flash1293/kibana that referenced this pull request Oct 13, 2020
@flash1293 flash1293 removed the v7.10.0 label Oct 13, 2020
jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 13, 2020
…otphase-to-formlib

* 'master' of github.com:elastic/kibana: (59 commits)
  [Security Solution][Resolver] Replace copy-to-clipboard with native navigator.clipboard (elastic#80193)
  [Security Solution] Reduce initial bundle size (elastic#78992)
  [Security Solution][Resolver] Fix Resize node box-shadow bug (elastic#80223)
  Move observability content (elastic#79978)
  skip flaky suite (elastic#79389)
  removing kibana_datatable` in favor of `datatable` (elastic#75184)
  [ML] Fixes for anomaly swim lane  (elastic#80299)
  [Lens] Smokescreen lens test unskip (elastic#80190)
  Improved AlertsClient tests structure by splitting a huge alerts_client.tests.ts file into a specific files defined by its responsibility. (elastic#80088)
  [APM] React key warning when opening popover with external resources (elastic#80328)
  [Step 1] use Observables on server search API (elastic#79874)
  Apply back pressure in Task Manager whenever Elasticsearch responds with a 429 (elastic#75666)
  [Lens] Leverage original http request error (elastic#79831)
  [Security Solution][Case] Improve ServiceConnectorCaseParams type (elastic#80109)
  [SECURITY_SOLUTION] Fix query on alert histogram (elastic#80219)
  [DOCS] Update ingest node pipelines doc (elastic#79187)
  [Ingest Manager] Split up OpenAPI spec file  (elastic#80107)
  [SECURITY_SOLUTION][ENDPOINT] Fix label on Trusted App create name field (elastic#80001)
  [Ingest Manager] Fix agent policy bump revision to create only one POLICY_CHANGE action (elastic#80081)
  Grid layout fixes (elastic#80305)
  ...

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/components/phases/shared/data_tier_allocation_field.tsx
#	x-pack/plugins/index_lifecycle_management/public/shared_imports.ts
flash1293 added a commit to flash1293/kibana that referenced this pull request Oct 14, 2020
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Oct 16, 2020
@kibanamachine
Copy link
Contributor

Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync.

@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Lens] Confusing behavior if Lens vis fails because of non OK http status code
5 participants