Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Support back-pressure from downstream consumers #20511

Closed
dmitryax opened this issue Mar 29, 2023 · 4 comments · Fixed by #20864
Closed

[pkg/stanza] Support back-pressure from downstream consumers #20511

dmitryax opened this issue Mar 29, 2023 · 4 comments · Fixed by #20864
Labels
enhancement New feature or request pkg/stanza

Comments

@dmitryax
Copy link
Member

dmitryax commented Mar 29, 2023

Currently, we drop logs on any error from the downstream consumer. This behavior is not desired in case of temporary errors. Users might want to slow down reading the log files instead of dropping the data.

We should provide a way to have such behavior that a configuration option can control.

@dmitryax dmitryax added enhancement New feature or request pkg/stanza needs triage New item requiring triage labels Mar 29, 2023
@github-actions
Copy link
Contributor

Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Mar 29, 2023

@dmitryax one related thing I would like to decide is if that configuration option needs to be per receiver or we want it to be per pipeline, for example as a memory_limiter processor config option?

If it is memory_limiter option then memory_limiter can easily make the decision to drop the data and return "success" to the caller. If the option is to retry then memory_limiter should return non-permanent error to the caller, preferably a distinct non-permanent error that indicates the retrying should not be immediate, but after a while (since in memory limited situations we want receivers to slow down).

It may be also worth looking into defining a well-known RetryableError type which also carries RetryAfter hint. This can be used both by memory_limiter and by exporterqueue/exporter. Some exporters can receive "retry hints" from their destinations and it would nice to propagate this hint all the way back to the receivers, which in turn can propagate it back to their senders (which is e.g. part of otlp protocol throttling) or wait themselves (which may be more applicable to receivers like filelog which can just pause and then retry and continue from the same place in the file).

@atoulme atoulme removed the needs triage New item requiring triage label Mar 29, 2023
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this issue Mar 31, 2023
Contributes to open-telemetry#1084

- Clarify what the memory limiter does.
- Set expectations from receivers, how they are supposed to react
  when the memory limiter refuses the data.

All receivers must adhere to this contract. See for example
an issue opened against filelog receiver:
open-telemetry/opentelemetry-collector-contrib#20511

Future work: one additional thing we can do is implement a backoff
logic in the memory limiter. When in memory limited mode the processor
can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics
call. This will allow to slow down the inflow of data into the Collector
and give time for the pipeline to clear and memory usage to return to the
normal. This needs to be explored further.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this issue Mar 31, 2023
Contributes to open-telemetry#1084

- Clarify what the memory limiter does.
- Set expectations from receivers, how they are supposed to react
  when the memory limiter refuses the data.

All receivers must adhere to this contract. See for example
an issue opened against filelog receiver:
open-telemetry/opentelemetry-collector-contrib#20511

Note that there are no functional changes to the memory limiter.

Future work: one additional thing we can do is implement a backoff
logic in the memory limiter. When in memory limited mode the processor
can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics
call. This will allow to slow down the inflow of data into the Collector
and give time for the pipeline to clear and memory usage to return to the
normal. This needs to be explored further.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this issue Mar 31, 2023
Contributes to open-telemetry#1084

- Clarify what the memory limiter does.
- Set expectations from receivers, how they are supposed to react
  when the memory limiter refuses the data.
- Add a test that demonstrates that memory limiter does not lose data
  if the receiver and exporter behave according to the contract.

All receivers must adhere to this contract. See for example
an issue opened against filelog receiver:
open-telemetry/opentelemetry-collector-contrib#20511

Note that there are no functional changes to the memory limiter.

Future work: one additional thing we can do is implement a backoff
logic in the memory limiter. When in memory limited mode the processor
can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics
call. This will allow to slow down the inflow of data into the Collector
and give time for the pipeline to clear and memory usage to return to the
normal. This needs to be explored further.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this issue Mar 31, 2023
Contributes to open-telemetry#1084

- Clarify what the memory limiter does.
- Set expectations from receivers, how they are supposed to react
  when the memory limiter refuses the data.
- Add a test that demonstrates that memory limiter does not lose data
  if the receiver and exporter behave according to the contract.

All receivers must adhere to this contract. See for example
an issue opened against filelog receiver:
open-telemetry/opentelemetry-collector-contrib#20511

Note that there are no functional changes to the memory limiter.

Future work: one additional thing we can do is implement a backoff
logic in the memory limiter. When in memory limited mode the processor
can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics
call. This will allow to slow down the inflow of data into the Collector
and give time for the pipeline to clear and memory usage to return to the
normal. This needs to be explored further.
pull bot pushed a commit to boost-entropy-golang/opentelemetry-collector that referenced this issue Apr 3, 2023
…etry#7459)

Contributes to open-telemetry#1084

- Clarify what the memory limiter does.
- Set expectations from receivers, how they are supposed to react
  when the memory limiter refuses the data.
- Add a test that demonstrates that memory limiter does not lose data
  if the receiver and exporter behave according to the contract.

All receivers must adhere to this contract. See for example
an issue opened against filelog receiver:
open-telemetry/opentelemetry-collector-contrib#20511

Note that there are no functional changes to the memory limiter.

Future work: one additional thing we can do is implement a backoff
logic in the memory limiter. When in memory limited mode the processor
can introduce pauses before it returns from the ConsumeLogs/Traces/Metrics
call. This will allow to slow down the inflow of data into the Collector
and give time for the pipeline to clear and memory usage to return to the
normal. This needs to be explored further.
@dmitryax
Copy link
Member Author

dmitryax commented Apr 5, 2023

@dmitryax one related thing I would like to decide is if that configuration option needs to be per receiver or we want it to be per pipeline, for example as a memory_limiter processor config option?

I think the memory limiter should not automatically drop itself and instead provide more specific error messages such as RetryableError, as you suggested. My reasoning is that the decision to drop or retry can significantly vary among different receivers. Even the same receiver type can have different modes of operation, such as the filelog receiver which may choose to drop certain application logs but never miss any audit logs. Given that multiple receivers are typically used in a single pipeline, setting this configuration on the memory limiter would require splitting the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/stanza
Projects
None yet
3 participants