Kibana 7.10.0-SNAPSHOT memory usage leads to OOMKill on ECK #76783

sebgl · 2020-09-04T14:40:01Z

Kibana version: latest 7.10.0-SNAPSHOT, as of September 4 2020

Elasticsearch version: latest 7.10.0-SNAPSHOT, as of September 4 2020

Kibana 7.10.0-SNAPSHOT seems to OOM after a few seconds/minutes on ECK, before it is completely initialized.
ECK sets a default 1Gi memory limit, which does not seem large enough.

On successful runs (with >1Gi memory limit) the memory usage reported by kubectl top pod is in the 950Mi-1200Mi range.
If I set a memory limit of 900Mi, the Pod gets OOMKilled most of the times.

When running the previous version (7.9.0), I see a memory usage in the 700Mi-750Mi range

Is that a potential bug, or is the higher memory usage expected (in which case we can raise ECK's defaults)?

I cannot easily reproduce the OOMKill when using Docker though:docker run --rm --link elasticsearch:elasticsearch -p 5601:5601 -v /tmp/kibana.yml:/usr/share/kibana/config/kibana.yml --memory=700m docker.elastic.co/kibana/kibana:7.10.0-SNAPSHOT. The container does not get OOM killed. But it's not running with the same configuration as ECK's default one.

ECK issue: elastic/cloud-on-k8s#3710
Maybe related: #72987

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-09-04T18:13:57Z

Pinging @elastic/kibana-app-arch (Team:AppArch)

elasticmachine · 2020-09-04T18:14:14Z

Pinging @elastic/kibana-operations (Team:Operations)

mikecote · 2020-09-04T18:15:04Z

Sorry AppArch for the ping :( wrong issue.

etwillbefine · 2020-09-16T16:49:37Z

I saw the same Behaviour on a Kubernetes Cluster using 7.9.1 Release. The root cause was a "file not found" error while Kibana was starting (tried to read non existing .crt file). Maybe you can find additional Information in your Kibana Logs as well. The Status reported for the Pod was "OOMKilled". Once that error was fixed Kibana started successfully without OOM.

tylersmalley · 2020-09-22T14:22:56Z

@joshdover this appears to have started with cbf2844

8221 runs with 700MB, OOMs at 650MB
8222 runs with 1100MB, OOMs at 1000MB

elasticmachine · 2020-09-22T14:23:06Z

Pinging @elastic/kibana-platform (Team:Platform)

jbudz · 2020-09-23T09:21:53Z

We should make sure max-old-space-size is set and has padding. I don't think we can rely on a (dynamic mem limit + docker overhead + chromium + apm and so on) < docker oom-kill. and now i'm wondering if that'll work with https://github.com/elastic/kibana/blob/master/x-pack/plugins/reporting/server/browsers/chromium/driver_factory/start_logs.ts#L77

Do you have a link to the production configs we use (or slack me)? e.g. not node oom, but this is triggered at the container level, and we're targeting 0 swap or something. any logs available? I'll get an env setup at some point so np if not.

And then for the mem changes - maybe cbf2844#diff-265d2762a1cbdb7bc87354b9cfc97dd6R132 related ref+ new platform observables causing more frequent updates? i haven't gone any further than vaguely remembering the issue and ctrl+f, so just speculating here.

joshdover · 2020-09-23T19:25:47Z

PR to resolve this problem with cbf2844: #78342

joshdover · 2020-09-24T14:22:55Z

#78342 was merged late yesterday. The next 7.10 snapshot (which is building right now) should include this change for testing. @sebgl would you be able to confirm if the issue is fixed for ECK?

sebgl · 2020-09-29T08:53:05Z

Thanks for the head's up @joshdover. We run our E2E tests with the latest SNAPSHOT Docker image every night so we can let you know if this happens again.
In the meantime, I think we can close this.

Thanks for the fix!

sebgl · 2020-10-01T07:34:03Z

It seems to be happening again in ECK nightly E2E tests: elastic/cloud-on-k8s#3710 (comment).

spalger · 2020-10-01T23:19:19Z

I think #79176 will solve this

jbudz · 2020-12-03T19:18:20Z

Closing this out, upstream is closed.

sebgl mentioned this issue Sep 4, 2020

TestUpdateKibanaResources is flaky with Kibana 7.10.0-SNAPSHOT elastic/cloud-on-k8s#3710

Closed

mikecote added Team:AppArch triage_needed labels Sep 4, 2020

mikecote added Team:Operations Team label for Operations Team and removed Team:AppArch labels Sep 4, 2020

tylersmalley added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Sep 22, 2020

EricDavisX mentioned this issue Sep 22, 2020

[Breaking change] 7.9.2-> 7.10 / 8.0 Upgrade fails (as tested via cloud staging) #78145

Closed

sebgl closed this as completed Sep 29, 2020

sebgl reopened this Oct 1, 2020

jbudz closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana 7.10.0-SNAPSHOT memory usage leads to OOMKill on ECK #76783

Kibana 7.10.0-SNAPSHOT memory usage leads to OOMKill on ECK #76783

sebgl commented Sep 4, 2020

elasticmachine commented Sep 4, 2020

elasticmachine commented Sep 4, 2020

mikecote commented Sep 4, 2020

etwillbefine commented Sep 16, 2020

tylersmalley commented Sep 22, 2020

elasticmachine commented Sep 22, 2020

jbudz commented Sep 23, 2020 •

edited

Loading

joshdover commented Sep 23, 2020 •

edited

Loading

joshdover commented Sep 24, 2020

sebgl commented Sep 29, 2020

sebgl commented Oct 1, 2020

spalger commented Oct 1, 2020 •

edited

Loading

jbudz commented Dec 3, 2020

Kibana 7.10.0-SNAPSHOT memory usage leads to OOMKill on ECK #76783

Kibana 7.10.0-SNAPSHOT memory usage leads to OOMKill on ECK #76783

Comments

sebgl commented Sep 4, 2020

elasticmachine commented Sep 4, 2020

elasticmachine commented Sep 4, 2020

mikecote commented Sep 4, 2020

etwillbefine commented Sep 16, 2020

tylersmalley commented Sep 22, 2020

elasticmachine commented Sep 22, 2020

jbudz commented Sep 23, 2020 • edited Loading

joshdover commented Sep 23, 2020 • edited Loading

joshdover commented Sep 24, 2020

sebgl commented Sep 29, 2020

sebgl commented Oct 1, 2020

spalger commented Oct 1, 2020 • edited Loading

jbudz commented Dec 3, 2020

jbudz commented Sep 23, 2020 •

edited

Loading

joshdover commented Sep 23, 2020 •

edited

Loading

spalger commented Oct 1, 2020 •

edited

Loading