Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging 500 errors on Cloud #80857

Closed
mshustov opened this issue Oct 16, 2020 · 4 comments
Closed

Debugging 500 errors on Cloud #80857

mshustov opened this issue Oct 16, 2020 · 4 comments
Labels
discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@mshustov
Copy link
Contributor

On today's Bugpool was raised the problem that it's hard for support and customers to diagnose 500 errors returned by Kibana Server.
During the migration to the Kibana platform, we deliberately decided not to provide error details on uncaught exceptions to prevent error details leaking, but log them instead. There are at least two problems with the current approach:

  1. Our customers do not have access to Kibana logs on Cloud. If they receive 500 from the Kibana server because Elasticsearch responded with 503 due to high load, there is no clear indicator Elasticsearch is the source of the problem. The only available option is to contact support, which creates an unnecessary burden on both customers & support. We might need to reconsider our decision not to provide uncaught exception details for elasticsearch errors bubbled up to the request handler. As we already do for 401 errors preserve 401 errors from legacy es client #71234
  2. There is an inconsistency in logging logic. The cloud team is working on exposing Kibana logs to the Cloud customer. Even when Kibana logs available to the customers, they won't be reliable enough until we fix the Add error logs for HTTP 500 error details #65291 @joshdover can we prioritize it for 7.11?

@kobelb @elastic/kibana-platform

@mshustov mshustov added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Oct 16, 2020
@kobelb
Copy link
Contributor

kobelb commented Oct 16, 2020

We might need to reconsider our decision not to provide uncaught exception details for elasticsearch errors bubbled up to the request handler. As we already do for 401 errors #71234

Conceptually, I think it's fine for us to include the response body from Elasticsearch 503s in Kibana's response bodies. We do want to ensure that Kibana's responses do not contain sensitive information, like file-paths or username/passwords, but if Elasticsearch error's response includes a body, and Elasticsearch has ensured that the response bodies don't contain sensitive information, those are transitively safe to expose to end-users. It does feel like we should make it clear that it's an Elasticsearch 503 as opposed to a Kibana 503. Just proxying the ES response through potentially loses this differentiation.

@pgayvallet
Copy link
Contributor

Conceptually, I think it's fine for us to include the response body from Elasticsearch 503s in Kibana's response bodies

  • That would mean to handle ES (re-)thrown errors differently than others in core's http error handling logic. Is that alright (probably as we are already doing that for 401s, but still asking)?

  • Do we want that only for 503s, or should we generalize for all ES errors catched by the http error handler?

@kobelb
Copy link
Contributor

kobelb commented Oct 16, 2020

@pgayvallet I think this is generally alright. I do think it's sometimes awkward that we're propagating ES errors to the end-users because the specific calls that Kibana is making to Elasticsearch should generally be seen as an "implementation detail" of Kibana. For example, it'd be weird to return an error that refers to the .kibana index when a user performs some saved-object operation because the .kibana index is an implementation detail of saved-objects. However, in situations where Kibana is essentially a light-weight proxy for Elasticsearch, it makes sense. That's my only concern with generalizing those to more than 503s...

Ideally, more consumers of ES would "wrap" the calls and throw informative, yet non-ES specific, errors when they got an unexpected status-code, but 🤷

@pgayvallet
Copy link
Contributor

I think we can close this, given that:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants