Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected Memory Leak in linkerd-proxy #2345

Closed
christianhuening opened this issue Feb 21, 2019 · 5 comments
Closed

Suspected Memory Leak in linkerd-proxy #2345

christianhuening opened this issue Feb 21, 2019 · 5 comments
Labels

Comments

@christianhuening
Copy link
Contributor

Bug Report

What is the issue?

linkerd-proxy container is using more than 35GB of memory
We're using TLS and auto-inject, running with Linkerd2 2.2 stable

How can it be reproduced?

Run a Pod with TLS. Happens occasionally

Logs, error output, etc

linkerd-proxy container output:

WARN admin={bg=tls-config} linkerd2_fs_watch::inotify watch error: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }, polling the fs until next change

Metrics output (process_* only):

process_start_time_seconds 1550772831
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 7780
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 547
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 52501831680
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 39958102016

(If the output is long, please create a gist and
paste the link here.)

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ control plane namespace exists
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ can query the control plane API
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus

linkerd-service-profile
-----------------------
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 2.2.0 but the latest stable version is 2.2.1
    see https://linkerd.io/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.2.0 but the latest stable version is 2.2.1
    see https://linkerd.io/checks/#l5d-version-control for hints
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.13.2
  • Cluster Environment: (GKE, AKS, kops, ...): bare-metal
  • Host OS: Container Linux 1967.6.0
  • Linkerd version: 2.2.0

Possible solution

N/A

Additional context

@admc admc added the priority/P0 Release Blocker label Feb 21, 2019
@hawkw
Copy link
Member

hawkw commented Feb 21, 2019

Fairly sure this is caused by #2331. The polling-based fs watch implementation probably has a slow leak in it somewhere.

@ihcsim
Copy link
Contributor

ihcsim commented Feb 28, 2019

@christianhuening Can you try out the latest edge-19.2.5 release to see if it resolves this issue? Let us know how it goes.

@christianhuening
Copy link
Contributor Author

@ihcsim updated, I'll report back, should I see another high mem use

@christianhuening
Copy link
Contributor Author

That one didn't reappear. Closing for now

@hawkw
Copy link
Member

hawkw commented Mar 4, 2019

@christianhuening Great, I'm glad to hear that this was fixed!

We should probably open a new ticket to resolve the underlying issue with the polling fs watch implementation. However, since fixing #2331 has resolved the issue that was causing us proxies to fall back to polling unnecessarily, it might be somewhat less urgent.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants