Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric_type mapping to stat datastream of HA Proxy #7183

Merged
merged 6 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/haproxy/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.8.1"
changes:
- description: Add `metric_type` mapping for the 'stat' datastream.
type: enhancement
link: https://github.com/elastic/integrations/pull/7183
- version: "1.8.0"
changes:
- description: Add `metric_type` mapping for `info` datastream.
Expand Down
59 changes: 59 additions & 0 deletions packages/haproxy/data_stream/stat/fields/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@
Status (UP, DOWN, NOLB, MAINT, or MAINT(via)...).
- name: weight
type: long
metric_type: gauge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agithomas, Please cross check once if it is counter/gauge.

haproxy doc:

  1. weight [..BS]: total effective weight (backend), effective weight (server)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping is MaP, with a indicating an average value. So, it is a gauge type.

description: |
Total weight (for backends), or server weight (for servers).
- name: downtime
type: long
metric_type: counter
description: |
Total downtime (in seconds). For backends, this value is the downtime for the whole backend, not the sum of the downtime for the servers.
- name: component_type
Expand All @@ -24,24 +26,29 @@
- name: in.bytes
type: long
format: bytes
metric_type: counter
description: |
Bytes in.
- name: out.bytes
type: long
format: bytes
metric_type: counter
description: |
Bytes out.
- name: last_change
type: integer
metric_type: gauge
description: |
Number of seconds since the last UP->DOWN or DOWN->UP transition.
- name: throttle.pct
type: scaled_float
format: percent
metric_type: gauge
description: |
Current throttle percentage for the server when slowstart is active, or no value if slowstart is inactive.
- name: selected.total
type: long
metric_type: counter
description: |
Total number of times a server was selected, either for new sessions, or when re-dispatching. For servers, this field reports the the number of times the server was selected.
- name: tracked.id
Expand All @@ -61,82 +68,99 @@
fields:
- name: total
type: long
metric_type: counter
description: |
Cumulative number of connections.
- name: retried
type: long
metric_type: counter
description: |
Number of times a connection to a server was retried.
- name: time.avg
type: long
metric_type: gauge
description: |
Average connect time in ms over the last 1024 requests.
- name: rate
type: long
metric_type: gauge
description: |
Number of connections over the last second.
- name: rate_max
type: long
metric_type: gauge
description: |
Highest value of connection.rate.
- name: attempt.total
type: long
metric_type: counter
description: |
Number of connection establishment attempts.
- name: reuse.total
type: long
metric_type: counter
description: |
Number of connection reuses.
- name: idle
type: group
fields:
- name: total
type: long
metric_type: gauge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idle.total is it a counter instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema definition is srv_icur.1:MGP. So, it is a gauge type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be named like idle.current instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would have been optimal. The description goes as Current number of idle connections available for reuse on this server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is to be changed, I would prefer that taken up as a separate enhancement with an assessment of impact (say dashboard changes). Agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, any un-related changes can be filed as a separate backlogs. In this doc also: idle section coming empty.

      "idle": {},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generated some traffic to populate the value for idle. Please find the extract from the document below

"connection": {
          "reuse.total": 7,
          "retried": 0,
          "cache": {},
          "idle": {
            "total": 1
          },
          "attempt.total": 290,
          "time.avg": 1
        },
        "queue": {
          "time.avg": 0
        },
        "status": "UP",
        "client.aborted": 1,
        "compressor": {}
      }

description: |
Number of idle connections available for reuse.
- name: limit
type: long
metric_type: gauge
description: |
Limit on idle connections available for reuse.
- name: cache
type: group
fields:
- name: lookup.total
type: long
metric_type: counter
description: |
Number of cache lookups.
- name: hits
type: long
metric_type: counter
description: |
Number of cache hits.
- name: request
type: group
fields:
- name: denied
type: long
metric_type: counter
description: |
Requests denied because of security concerns.

* For TCP this is because of a matched tcp-request content rule.
* For HTTP this is because of a matched http-request or tarpit rule.
- name: denied_by_connection_rules
type: long
metric_type: counter
description: |
Requests denied because of TCP request connection rules.
- name: denied_by_session_rules
type: long
metric_type: counter
description: |
Requests denied because of TCP request session rules.
- name: queued.current
type: long
metric_type: gauge
description: |
Current queued requests. For backends, this field reports the number of requests queued without a server assigned.
- name: queued.max
type: long
metric_type: gauge
description: |
Maximum value of queued.current.
- name: errors
type: long
metric_type: counter
description: |
Request errors. Some of the possible causes are:

Expand All @@ -148,72 +172,87 @@
* request was tarpitted.
- name: redispatched
type: long
metric_type: counter
description: |
Number of times a request was redispatched to another server. For servers, this field reports the number of times the server was switched away from.
- name: connection.errors
type: long
metric_type: counter
description: |
Number of requests that encountered an error trying to connect to a server. For backends, this field reports the sum of the stat for all backend servers, plus any connection errors not associated with a particular server (such as the backend having no active servers).
- name: rate
type: group
fields:
- name: value
type: long
metric_type: gauge
description: |
Number of HTTP requests per second over the last elapsed second.
- name: max
type: long
metric_type: gauge
description: |
Maximum number of HTTP requests per second.
- name: total
type: long
metric_type: counter
description: |
Total number of HTTP requests received.
- name: intercepted
type: long
metric_type: counter
description: |
Number of intercepted requests.
- name: response
type: group
fields:
- name: errors
type: long
metric_type: counter
description: |
Number of response errors. This value includes the number of data transfers aborted by the server (haproxy.stat.server.aborted). Some other errors are:
* write errors on the client socket (won't be counted for the server stat) * failure applying filters to the response
- name: time.avg
type: long
metric_type: gauge
description: |
Average response time in ms over the last 1024 requests (0 for TCP).
- name: denied
type: integer
metric_type: counter
description: |
Responses denied because of security concerns. For HTTP this is because of a matched http-request rule, or "option checkcache".
- name: http
type: group
fields:
- name: 1xx
type: long
metric_type: counter
description: |
HTTP responses with 1xx code.
- name: 2xx
type: long
metric_type: counter
description: |
HTTP responses with 2xx code.
- name: 3xx
type: long
metric_type: counter
description: |
HTTP responses with 3xx code.
- name: 4xx
type: long
metric_type: counter
description: |
HTTP responses with 4xx code.
- name: 5xx
type: long
metric_type: counter
description: |
HTTP responses with 5xx code.
- name: other
type: long
metric_type: counter
description: |
HTTP responses with other codes (protocol error).
- name: header
Expand All @@ -227,40 +266,48 @@
fields:
- name: total
type: long
metric_type: counter
description: |
Number of failed header rewrite warnings.
- name: session
type: group
fields:
- name: current
type: long
metric_type: gauge
description: |
Number of current sessions.
- name: max
type: long
metric_type: gauge
description: |
Maximum number of sessions.
- name: limit
type: long
metric_type: gauge
description: |
Configured session limit.
- name: total
type: long
metric_type: counter
description: |
Number of all sessions.
- name: rate
type: group
fields:
- name: value
type: integer
metric_type: gauge
description: |
Number of sessions per second over the last elapsed second.
- name: limit
type: integer
metric_type: gauge
description: |
Configured limit on new sessions per second.
- name: max
type: integer
metric_type: gauge
description: |
Maximum number of new sessions per second.
- name: check
Expand Down Expand Up @@ -293,6 +340,7 @@
Layer 5-7 code, if available.
- name: duration
type: long
metric_type: gauge
description: |
Time in ms that it took to finish the last health check.
- name: health.last
Expand All @@ -306,15 +354,18 @@
- name: agent.last
type: integer
- name: failed
metric_type: counter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a wrong type. Can we cross check?

    "check": {
      "agent.last": "",
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it is a bug and hence i didn't assign any metric_type mapping to check.agent.last field.

Bug reported here : #7202

type: long
description: |
Number of checks that failed while the server was up.
- name: down
type: long
metric_type: counter
description: |
Number of UP->DOWN transitions. For backends, this value is the number of transitions to the whole backend being down, rather than the sum of the transitions for each server.
- name: client.aborted
type: integer
metric_type: counter
description: |
Number of data transfers aborted by the client.
- name: server
Expand All @@ -326,14 +377,17 @@
Server ID (unique inside a proxy).
- name: aborted
type: integer
metric_type: counter
description: |
Number of data transfers aborted by the server. This value is included in haproxy.stat.response.errors.
- name: active
type: integer
metric_type: gauge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to cross check this is counter or gauge, since aborted and backup are counter for server group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping is SGP. So, it is a gauge type

description: |
Number of backend servers that are active, meaning that they are healthy and can receive requests from the load balancer.
- name: backup
type: integer
metric_type: gauge
description: |
Number of backend servers that are backup servers.
- name: compressor
Expand All @@ -342,21 +396,25 @@
- name: in.bytes
type: long
format: bytes
metric_type: counter
description: |
Number of HTTP response bytes fed to the compressor.
- name: out.bytes
type: integer
format: bytes
metric_type: counter
description: |
Number of HTTP response bytes emitted by the compressor.
- name: bypassed.bytes
type: long
format: bytes
metric_type: counter
description: |
Number of bytes that bypassed the HTTP compressor (CPU/BW limit).
- name: response.bytes
type: long
format: bytes
metric_type: counter
description: |
Number of HTTP responses that were compressed.
- name: proxy
Expand All @@ -383,6 +441,7 @@
Configured queue limit (maxqueue) for the server, or nothing if the value of maxqueue is 0 (meaning no limit).
- name: time.avg
type: integer
metric_type: gauge
description: |
The average queue time in ms over the last 1024 requests.
- name: agent
Expand Down
Loading