Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UA] Tight worker loop can cause high CPU usage #60950

Merged

Conversation

jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Mar 23, 2020

Summary

In Upgrade Assistant, when there are multiple Kibana instances sharing an ES cluster, the worker loop can consume a lot of CPU under certain conditions.

How to reproduce on master

  1. In the file x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_handler.ts comment out the line that reads credentialStore.set(reindexOp, headers);. This will simulate a situation where we are a Kibana instance that does not have the user credentials required for furthering the reindex operation - this is the key to unlocking the performance bug.
  2. Submit a reindex operation (batched or otherwise), the node process CPU usage will remain high for the entire duration PAUSE_WINDOW as defined in the worker.ts file. This is because we are not pushing the in progress operation forward so we are not able to "work through" our in progress operations.

Screenshot 2020-03-23 at 18 12 45

Solution

The simplest solution was just to add some padding in the form of simulated sleep.

Additional

There was also a (small) potential issue with queued items that could still be seen as stale (see #60770). We now let workers without credentials to update the reindex op double check queued operations.

@jloleysens jloleysens added bug Fixes for quality problems that affect the customer experience v8.0.0 Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more release_note:skip Skip the PR/issue when compiling release notes Feature:Upgrade Assistant v7.7.0 labels Mar 23, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/es-ui (Team:Elasticsearch UI)

The worker scheduler should only sleep when it cannot process any
in progress operations. Additionally, logic has been added
for handling of queue operations that have been in the queue for
a long time and may be viewed as still in small window of time
by wokers that do not have the credentials to process those
reindex operations.
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@sebelga sebelga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Tested locally and works as expected.

) {
// TODO: This tight loop needs something to relax potentially high CPU demands so this padding is added.
// This scheduler should be revisited in future.
await new Promise(res => setTimeout(res, WORKER_PADDING_MS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do you mind using resolve instead of res. I first read it as response (that I always shortened as res! 😄 )

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jloleysens jloleysens merged commit 12c8ff7 into elastic:master Mar 31, 2020
@jloleysens jloleysens deleted the ua/fix/tight-worker-loop-high-cpu branch March 31, 2020 15:26
jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 31, 2020
* Addded worker padding to save some CPU

* Updated comments

* Update worker scheduler and add a new util

The worker scheduler should only sleep when it cannot process any
in progress operations. Additionally, logic has been added
for handling of queue operations that have been in the queue for
a long time and may be viewed as still in small window of time
by wokers that do not have the credentials to process those
reindex operations.

* res 👉🏻resolve

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 31, 2020
* Addded worker padding to save some CPU

* Updated comments

* Update worker scheduler and add a new util

The worker scheduler should only sleep when it cannot process any
in progress operations. Additionally, logic has been added
for handling of queue operations that have been in the queue for
a long time and may be viewed as still in small window of time
by wokers that do not have the credentials to process those
reindex operations.

* res 👉🏻resolve

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Mar 31, 2020
* upstream/master: (69 commits)
  Adding PagerDuty icon to connectors cards (elastic#60805)
  Fix drag and drop flakiness (elastic#61993)
  Grok debugger migration (elastic#60658)
  Endpoint: Fix resolver SVG position issue (elastic#61886)
  [SIEM] version 7.7 rule import (elastic#61903)
  Added styles to make combobox list items wider for alerting flyout (elastic#61894)
  [UA] Tight worker loop can cause high CPU usage (elastic#60950)
  [ML] DF Analytics results table: use index pattern field format if one exists (elastic#61709)
  [ML] Catching unknown index pattern errors (elastic#61935)
  [Discover] Deangularize and euificate sidebar  (elastic#47559)
  Endpoint: Add ts-node dev dependency (elastic#61884)
  Add an onBlur handler for the kuery bar. Only resubmit when input changes. (elastic#61901)
  [ML] Handle Empty Partition Field Values in Single Metric Viewer (elastic#61649)
  Auto interval on date histogram is getting displayed as timestamp per… (elastic#59171)
  [Maps] Explicitly pass fetch function to ems-client (elastic#61846)
  [SIEM][CASE] Fix aria-labels and translations (elastic#61670)
  [ML] Settings: Increase number of items that can be paged in calendars and filters lists (elastic#61842)
  [EPM] update epm filepath route (elastic#61910)
  APM] Set ignore_above to 1024 for telemetry saved object (elastic#61732)
  [Logs UI] Log stream row rendering (elastic#60773)
  ...
jloleysens added a commit that referenced this pull request Apr 1, 2020
* Addded worker padding to save some CPU

* Updated comments

* Update worker scheduler and add a new util

The worker scheduler should only sleep when it cannot process any
in progress operations. Additionally, logic has been added
for handling of queue operations that have been in the queue for
a long time and may be viewed as still in small window of time
by wokers that do not have the credentials to process those
reindex operations.

* res 👉🏻resolve

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jloleysens added a commit that referenced this pull request Apr 1, 2020
* Addded worker padding to save some CPU

* Updated comments

* Update worker scheduler and add a new util

The worker scheduler should only sleep when it cannot process any
in progress operations. Additionally, logic has been added
for handling of queue operations that have been in the queue for
a long time and may be viewed as still in small window of time
by wokers that do not have the credentials to process those
reindex operations.

* res 👉🏻resolve

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Apr 1, 2020
* master: (64 commits)
  Adding PagerDuty icon to connectors cards (elastic#60805)
  Fix drag and drop flakiness (elastic#61993)
  Grok debugger migration (elastic#60658)
  Endpoint: Fix resolver SVG position issue (elastic#61886)
  [SIEM] version 7.7 rule import (elastic#61903)
  Added styles to make combobox list items wider for alerting flyout (elastic#61894)
  [UA] Tight worker loop can cause high CPU usage (elastic#60950)
  [ML] DF Analytics results table: use index pattern field format if one exists (elastic#61709)
  [ML] Catching unknown index pattern errors (elastic#61935)
  [Discover] Deangularize and euificate sidebar  (elastic#47559)
  Endpoint: Add ts-node dev dependency (elastic#61884)
  Add an onBlur handler for the kuery bar. Only resubmit when input changes. (elastic#61901)
  [ML] Handle Empty Partition Field Values in Single Metric Viewer (elastic#61649)
  Auto interval on date histogram is getting displayed as timestamp per… (elastic#59171)
  [Maps] Explicitly pass fetch function to ems-client (elastic#61846)
  [SIEM][CASE] Fix aria-labels and translations (elastic#61670)
  [ML] Settings: Increase number of items that can be paged in calendars and filters lists (elastic#61842)
  [EPM] update epm filepath route (elastic#61910)
  APM] Set ignore_above to 1024 for telemetry saved object (elastic#61732)
  [Logs UI] Log stream row rendering (elastic#60773)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Upgrade Assistant release_note:skip Skip the PR/issue when compiling release notes Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants