[aws-for-fluent-bit] Connection Refused in Liveness Probe #983

jcarvalho · 2023-08-18T10:19:46Z

Describe the bug
When upgrading the aws-for-fluent-bit from version 0.1.27 to 0.1.28, our Fluent Bit pods enter a CrashLoopBackoff state, due to failures in the newly introduced Liveness Probe.

Pod events show the following message:

Liveness probe failed: Get "http://[2600:1f18:REDACTED::2]:2000/api/v1/health": dial tcp [2600:1f18:REDACTED::2]:2000: connect: connection refused

I believe this is related to the fact that the default HTTP_Listen is set to 0.0.0.0, which means it will not respond to any IPv6 probes (also confirmed this by shelling into the container and trying out curl -6 localhost:2020, which failed).

Changing the Chart Values to set HTTP_Listen to [::] fixes the issue, and it appears that the probes listen on both IPv4 and IPv6 addresses (but I don't have an IPv4 EKS Cluster to test this).

Steps to reproduce
Spin up an IPv6 EKS Cluster, install the aws-for-fluent-bit Chart in version 0.1.28. The pods will enter CrashLoopBackoff.

Expected outcome
The new Liveness Probe works correctly with the default Chart configuration.

Environment

Chart name: aws-for-fluent-bit
Chart version: 0.1.28
Kubernetes version: 1.26
Using EKS (yes/no), if so version? Yes, v1.26.7-eks-2d98532

Additional Context:
The EKS Cluster is configured for IPv6 addressing.

The text was updated successfully, but these errors were encountered:

jatinmehrotra · 2023-08-31T01:40:30Z

I think this error also exist for chart version 0.1.29 with eks version 1.25 even if cluster is configured for ipv4 addressing

jatinmehrotra · 2023-08-31T05:07:42Z

@jcarvalho
I was able to reproduce the same error message when I set health check off in my helm chart, probably you need to check whether you have similar settings, if yes maybe removing it will altogether give you an error describe in this issue #995

service:
  ## Allow the service to be exposed for monitoring
  ## For liveness check to work, Health_Check must be set to On
  ## https://docs.fluentbit.io/manual/administration/monitoring
  extraService: |
    Health_Check Off

jcarvalho added the bug Something isn't working label Aug 18, 2023

jatinmehrotra mentioned this issue Aug 31, 2023

[aws-for-fluent-bit] no health checks to restart pod when not healthy #946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aws-for-fluent-bit] Connection Refused in Liveness Probe #983

[aws-for-fluent-bit] Connection Refused in Liveness Probe #983

jcarvalho commented Aug 18, 2023

jatinmehrotra commented Aug 31, 2023

jatinmehrotra commented Aug 31, 2023 •

edited

Loading

[aws-for-fluent-bit] Connection Refused in Liveness Probe #983

[aws-for-fluent-bit] Connection Refused in Liveness Probe #983

Comments

jcarvalho commented Aug 18, 2023

jatinmehrotra commented Aug 31, 2023

jatinmehrotra commented Aug 31, 2023 • edited Loading

jatinmehrotra commented Aug 31, 2023 •

edited

Loading