Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling huge cardinality of page load transaction names #56

Open
hmdhk opened this issue Sep 21, 2018 · 4 comments
Open

Handling huge cardinality of page load transaction names #56

hmdhk opened this issue Sep 21, 2018 · 4 comments
Assignees
Milestone

Comments

@hmdhk
Copy link
Contributor

hmdhk commented Sep 21, 2018

A little bit of background on the issue, we used included the page url as transaction name (without the query string) by default and we had the same problem even without the query string there are a lot applications that simply include ids in their url.

Currently the default transaction name is unknown and not the page url so all transactions are grouped by default under unknown but there is an API to let users set the initial page load transaction name and it can be set to the page url by the user which probably is the easiest way to set the page name. This can create a large number of transaction names.

Some solutions:

  • Having a url pattern config option that we use to set the transaction name to the page url
  • Implementing heuristics that tries to detects ids in the URL (I've made a POC) on the agent
  • Implementing a grouping algorithm on the Kibana side.
@alvarolobato
Copy link

alvarolobato commented Sep 21, 2018

I think we could do a two step approach here:

  1. Improve the API and add an additional method other than setInitialPageLoadName() that would directly take the URL and strip the query string. We could instead provide a helper method to do it or instruct the user how to do it, but I would prefer having an specific function for it.

  2. Allow a way to initialize an list of pattern matches that the user could define in order to strip the parameters embedded in the URL. I would try to use simple matching patterns and avoid using regex for simplicity. This matching could be done either in the agent or the server but it seems that a single matching per pageload in the agent is not a big deal and we potentially save resources on the server.

Also the default behaviour cloud be changed to, probably, the step 1, based on the url. In my opinion is better than the current unknown
cc. @roncohen

@sorenlouv
Copy link
Member

I would try to use simple matching patterns and avoid using regex for simplicity.

I like minimatch for these usecases
https://github.com/isaacs/minimatch#minimatch

@alvarolobato
Copy link

Related to elastic/kibana#26544

@hmdhk
Copy link
Contributor Author

hmdhk commented Jun 10, 2020

We had a meeting around sampling and high cardinality:

  • We discussed storage and network traffic reduction
  • For storage we should look into aggregation and trimming data (e.g. removing spans for older transactions)
  • For network traffic
    • we still need to have sampling in some form or another
    • we discussed providing config options to let the user decide which transactions are important (this can be provided through central config)
    • Another idea is to crawl the website and discover the urls and let the user choose in the UI
  • High cardinality issue
    • We will provide a config option to let the user specify the url pattern (this can be configured
      in central config or in apm-server) -> issue
    • we discussed a heuristic based solution (POC)
    • we also discussed using machine learning to categorise url sections (I will do a POC on this)

cc @axw , @drewpost @vigneshshanmugam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants