Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Show warning in UI when cardinality of transaction.name exceeds threshold #67273

Closed
sorenlouv opened this issue May 25, 2020 · 12 comments
Closed
Assignees
Labels
Team:APM All issues that need APM UI Team support v7.9.0

Comments

@sorenlouv
Copy link
Member

sorenlouv commented May 25, 2020

Related: #26544 and elastic/apm-agent-rum-js#56

Transaction names are defined in the APM agent. Often these are picked up from frameworks but sometimes the user must define a pattern for these themselves. If they do this incorrectly every url will be send up as a unique transaction group which can cause an explosion in the number of transaction groups displayed by the UI.

Calculate number of transaction groups (cardinality of transaction.name)
The following terms agg will return the number of transaction groups per service. For some customers we've seen this being above 10 million. This makes the UI very inaccurate since we only show the top 200 transactions groups:

GET apm-*-transaction*/_search?terminate_after=10000
{
  "size": 0,
  "aggs": {
    "services": {
      "terms": {
        "field": "service.name",
        "size": 10
      },
      "aggs": {
        "distinct_names": {
          "cardinality": {
            "field": "transaction.name"
          }
        }
      }
    }
  }
}

Suggested solution
Show a warning in the UI if sum_other_doc_count or doc_count_error_upper_bound is above 0 (or perhaps above some other threshold).

Example:

Given the following terms agg:

GET apm-*-transaction*/_search
{
  "size": 0,
  "query": {
    "term": {
      "service.name": "some-service-with-many-transactions"
    }
  },
  "aggs": {
    "transactionsGroups": {
      "terms": {
        "field": "transaction.name",
        "size": 200
      }
    }
  }
}

The response from ES could look something like:

{
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "transactionsGroups" : {
      "doc_count_error_upper_bound" : 20433,
      "sum_other_doc_count" : 20816053,
      "buckets" : [
          // ...
      ]
    }
  }
}

In the above case the number of unaccounted transactions are above 20 million.

@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@formgeist
Copy link
Contributor

I have a couple of questions;

  • Do we expect this to be an issue for all agents or only a few?
  • Is there already specific documentation on configuration options that will resolve this issue for the user and where can I find it (in case we want to link to it)?

@sorenlouv
Copy link
Member Author

sorenlouv commented May 28, 2020

Do we expect this to be an issue for all agents or only a few?

This is something we've seen a lot with the RUM agent. It could also happen with other agents but I think it's much more rare.

@sorenlouv
Copy link
Member Author

Is there already specific documentation on configuration options that will resolve this issue for the user and where can I find it (in case we want to link to it)?

That's a good question. Not that I know of. @jahtalab @vigneshshanmugam or @bmorelli25 might know.

@vigneshshanmugam
Copy link
Member

vigneshshanmugam commented May 28, 2020

RUM agent has an option to set pageLoadTransactionName that would help user configure this only for page load transaction alone.

However, there are also other ways to fix the transaction name for both soft and hard navigations which I have commented here - https://stackoverflow.com/a/60703633/3588136

@sorenlouv
Copy link
Member Author

@vigneshshanmugam Thanks! Do you think it would be useful with documentation that is more language agnostic and more focused on the purpose of transaction.name (grouping of similar urls by patterns) and how it should be limited to 200 (default max number of transaction groups displayed by the ui). And how this limit can be increased (configurable in kibana).

@formgeist
Copy link
Contributor

formgeist commented May 28, 2020

Thanks for the feedback.

I've created a quick draft PR with a proposed design implementation for how to show the callout. #67610

I too think it's worth investigating whether we can create a single agent-agnostic documentation article that will help users debug their issues, which we can easily link to from the app.

@vigneshshanmugam
Copy link
Member

Do you think it would be useful with documentation that is more language agnostic and more focused on the purpose of transaction.name

Totally, a language agnostic documentation would definitely be useful in this context and help understand the issue. We could probably in the UI detect the agent name and link to the relevant language docs and potential solutions to fix it.

And how this limit can be increased (configurable in kibana).

Huge +1. Do we currently have a limit and would be cause any perf issue ? May be can we have an upper bound here in the UI?

@formgeist
Copy link
Contributor

We could probably in the UI detect the agent name and link to the relevant language docs and potential solutions to fix it.

This is typically where we've found that it's better to have a single documentation article to link to and then allow the user to find the solution that fits their application. Perhaps @bmorelli25 can weigh in here on the best approach?

@bmorelli25
Copy link
Member

bmorelli25 commented May 28, 2020

This sounds good to me. I'll add a language-agnostic section to the Troubleshoot common problems documentation that we can link to from the APM app. From there, we can provide links to any relevant Agent docs.

I've opened #67691 to track the docs.

@bmorelli25
Copy link
Member

bmorelli25 commented May 29, 2020

@sqren

This makes the UI very inaccurate since we only show the top 200 transactions groups:

and how it should be limited to 200 (default max number of transaction groups displayed by the ui).

In the docs, we list the default as 100. Just want to make sure 200 is correct before I update it.

EDIT: I think it's 100?

transactionGroupBucketSize: schema.number({ defaultValue: 100 }),

@sorenlouv
Copy link
Member Author

sorenlouv commented May 29, 2020

Sorry, I wrote that from memory. You are right that it is 100... sort of. As usual the reality is a bit more complex than one could hope: I went digging in the code and found that the configurable limit of 100 had been changed to a hardcoded limit of 10.000. This was a mistake, and will be fixed together with the warning improvement.

The limit will be made configurable again but perhaps increased to 500. I'll let you know when that happens so we can fix the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:APM All issues that need APM UI Team support v7.9.0
Projects
None yet
Development

No branches or pull requests

6 participants