Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Use status_code field to calculate error rate #71109

Merged
merged 10 commits into from
Jul 14, 2020

Conversation

cauemarcondes
Copy link
Contributor

@cauemarcondes cauemarcondes commented Jul 8, 2020

closes #70223

Screenshot 2020-07-13 at 13 44 01

@cauemarcondes cauemarcondes added Team:APM All issues that need APM UI Team support release_note:skip Skip the PR/issue when compiling release notes v7.9.0 labels Jul 8, 2020
@cauemarcondes cauemarcondes requested a review from a team as a code owner July 8, 2020 15:25
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@@ -17,6 +17,9 @@ import {
} from '../helpers/setup_request';
import { rangeFilter } from '../../../common/utils/range_filter';

// Regex for 5xx and 4xx
const errorStatusCodeRegex = /5\d{2}|4\d{2}/;

export async function getErrorRate({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an api test for GET /api/apm/services/{serviceName}/errors/rate to cover this?

@dgieselaar
Copy link
Member

dgieselaar commented Jul 8, 2020 via email

@cauemarcondes
Copy link
Contributor Author

http.response.status_code is a long, so you can use a range filter (gte 400) (you can also use it on strings btw).

So I could do:

 {
          "range": {
            "http.response.status_code": {
              "gte": 400,
              "lte": 599
            }
          }
        }

@dgieselaar
Copy link
Member

@cauemarcondes yes (I'm not sure if it's useful to provide the upper bounds though)

@cauemarcondes
Copy link
Contributor Author

@cauemarcondes yes (I'm not sure if it's useful to provide the upper bounds though)

Why? But I'm good to use only "gte": 400 too.

@dgieselaar
Copy link
Member

@cauemarcondes status codes from 600 and above are not valid, and if they do exist in the data, I'd expect that they should be treated as errors.

Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@cauemarcondes
Copy link
Contributor Author

lgtm 👍

thanks but I still have to add the error chart in the transaction details page. 😅

@sorenlouv
Copy link
Member

thanks but I still have to add the error chart in the transaction details page. 😅

Can you double check with @nehaduggal first?

@nehaduggal
Copy link

Lets call it Transaction error rate chart and add it to the transactions page.

@dgieselaar
Copy link
Member

Have we considered how this would work with pre-aggregated transactions?

@cauemarcondes
Copy link
Contributor Author

Have we considered how this would work with pre-aggregated transactions?

We will use the status_code until the agents develop the hasError flag in the transaction document. But I believe we might have the same problem with pre-aggregated transactions, won't we?

@dgieselaar
Copy link
Member

has_error is a boolean flag so its impact is limited. http.response.status_code has slightly higher cardinality but it should roughly align with transaction.result so maybe not a cause for concern either. Still, we need to flag this with the APM server team, it should be added there.

I do have some concerns about the things that are being added that are not compatible with pre-aggregated transactions today (the RUM UI is another example). APM Server's release cycle is not tied to Kibana, so it's a risk that the UI expects fields in the pre-aggregated transaction documents that are not there because the user is running an older version of APM Server. That being said, we haven't shipped pre-aggregated transactions yet, so maybe I'm thinking too far ahead.

@cauemarcondes
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@cauemarcondes cauemarcondes merged commit 90f233b into elastic:master Jul 14, 2020
@cauemarcondes cauemarcondes deleted the error-rate branch July 14, 2020 10:20
sorenlouv pushed a commit to sorenlouv/kibana that referenced this pull request Jul 14, 2020
* calculating error rate based on status code

* fixing unit test

* addressing pr comments

* adding erroneous transactions rate

* adding erroneous transactions rate

* adding error rate to detail page

* fixing i18n

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@sorenlouv
Copy link
Member

💚 Backport successful

The PR was backported to the following branches:

gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 14, 2020
* master: (314 commits)
  [APM] Use status_code field to calculate error rate (elastic#71109)
  [Observability] Change appLink passing the date range (elastic#71259)
  [Security] Add Timeline improvements (elastic#71506)
  adjust vislib bar opacity (elastic#71421)
  Fix ScopedHistory mock and adapt usages (elastic#71404)
  [Security Solution] Add hook for reading/writing resolver query params (elastic#70809)
  [APM] Bug fixes from ML integration testing (elastic#71564)
  [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404)
  [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275)
  [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321)
  Change signal.rule.risk score mapping from keyword to float (elastic#71126)
  Added help text where needed on connectors and alert actions UI (elastic#69601)
  [SIEM][Detections] Value Lists Management Modal (elastic#67068)
  [test] Skips test preventing promotion of ES snapshot elastic#71582
  [test] Skips test preventing promotion of ES snapshot elastic#71555
  [ILM] Fix alignment of the timing field (elastic#71273)
  [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540)
  initial telemetry setup (elastic#69330)
  [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027)
  Search across spaces (elastic#67644)
  ...
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jul 14, 2020
…t-apps-page-titles

* 'master' of github.com:elastic/kibana: (88 commits)
  [ML] Functional tests - disable DFA creation and cloning tests
  [APM] Use status_code field to calculate error rate (elastic#71109)
  [Observability] Change appLink passing the date range (elastic#71259)
  [Security] Add Timeline improvements (elastic#71506)
  adjust vislib bar opacity (elastic#71421)
  Fix ScopedHistory mock and adapt usages (elastic#71404)
  [Security Solution] Add hook for reading/writing resolver query params (elastic#70809)
  [APM] Bug fixes from ML integration testing (elastic#71564)
  [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404)
  [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275)
  [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321)
  Change signal.rule.risk score mapping from keyword to float (elastic#71126)
  Added help text where needed on connectors and alert actions UI (elastic#69601)
  [SIEM][Detections] Value Lists Management Modal (elastic#67068)
  [test] Skips test preventing promotion of ES snapshot elastic#71582
  [test] Skips test preventing promotion of ES snapshot elastic#71555
  [ILM] Fix alignment of the timing field (elastic#71273)
  [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540)
  initial telemetry setup (elastic#69330)
  [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027)
  ...

# Conflicts:
#	x-pack/plugins/index_management/public/application/index.tsx
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 14, 2020
* master: (72 commits)
  [test] Skips test preventing promotion of ES snapshot elastic#71612
  [Logs UI] Remove UUID from Alert Instances (elastic#71340)
  [Metrics UI] Remove UUID from Alert Instance IDs (elastic#71335)
  [ML] Functional tests - disable DFA creation and cloning tests
  [APM] Use status_code field to calculate error rate (elastic#71109)
  [Observability] Change appLink passing the date range (elastic#71259)
  [Security] Add Timeline improvements (elastic#71506)
  adjust vislib bar opacity (elastic#71421)
  Fix ScopedHistory mock and adapt usages (elastic#71404)
  [Security Solution] Add hook for reading/writing resolver query params (elastic#70809)
  [APM] Bug fixes from ML integration testing (elastic#71564)
  [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404)
  [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275)
  [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321)
  Change signal.rule.risk score mapping from keyword to float (elastic#71126)
  Added help text where needed on connectors and alert actions UI (elastic#69601)
  [SIEM][Detections] Value Lists Management Modal (elastic#67068)
  [test] Skips test preventing promotion of ES snapshot elastic#71582
  [test] Skips test preventing promotion of ES snapshot elastic#71555
  [ILM] Fix alignment of the timing field (elastic#71273)
  ...
cauemarcondes added a commit that referenced this pull request Jul 14, 2020
* calculating error rate based on status code

* fixing unit test

* addressing pr comments

* adding erroneous transactions rate

* adding erroneous transactions rate

* adding error rate to detail page

* fixing i18n

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Cauê Marcondes <55978943+cauemarcondes@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@cauemarcondes cauemarcondes added the apm:test-plan-7.9.0 Test plan for 7.9 release label Jul 20, 2020
@dgieselaar dgieselaar self-assigned this Aug 4, 2020
@dgieselaar
Copy link
Member

Some conclusions from testing:

@dgieselaar dgieselaar added the apm:test-plan-done Pull request that was successfully tested during the test plan label Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:test-plan-7.9.0 Test plan for 7.9 release apm:test-plan-done Pull request that was successfully tested during the test plan release_note:skip Skip the PR/issue when compiling release notes Team:APM All issues that need APM UI Team support v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APM] Use status_code field to calculate error rate
6 participants