-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate errors from exporters to receivers #7486
Changes from all commits
e8c9d1e
ae0e94f
d61ec11
97ef7c2
9c04786
38030d5
b9cee4f
25a00d0
73f6a18
aa732ee
3b68677
e45443d
ba9fee0
f9567e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: 'enhancement' | ||
|
||
# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver) | ||
component: otlpreceiver | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: Propagate HTTP errors from the exporter back to the caller | ||
|
||
# One or more tracking issues or pull requests related to the change | ||
issues: [7486] | ||
|
||
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: The Collector can now propagate errors from backends to the client when used as a proxy. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Internal collector errors | ||
|
||
This package includes helpers propagating errors from exporters or processors | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The behaviour around error propagation for synchronous pipelines should be documented in a place that's more visible than the package's readme :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docs/design.md perhaps, on a new "Errors" section? Would it be OK to do it on a follow-up PR, as I think this section would need more than just information about errors in sync pipelines. I should add a word or two about errors in async pipelines as well. |
||
back to receivers. Error propagation is helpful in a setup where the Collector | ||
acts as a simple gateway between clients and other Collector instances or | ||
backends. | ||
|
||
Error propagation only works when there's synchronous processing of a pipeline. | ||
jpkrohling marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Concretely, this means there should be no batch processors, sending queues for | ||
exporters and potentially other resiliency mechanisms should be disabled. When | ||
the pipeline is async, as it is the case when the batch processor is part of it, | ||
the errors being returned by the backends used by the exporters are returned | ||
only after the Collector's client has received its response already. | ||
|
||
Processors doing data transformation should be treated with care, perhaps even | ||
avoided for sync pipelines. The reason is that when a failure happens _because_ | ||
of those transformations, the Collector's client isn't at fault: a 400 returned | ||
by a backend should NOT propagate back to the client. | ||
|
||
Shared receivers might also fail in different ways than the outcome of the | ||
exporters. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package colerrs // import "go.opentelemetry.io/collector/internal/colerrs" | ||
|
||
var _ error = (*RequestError)(nil) | ||
|
||
// RequestError represents an error returned during HTTP client operations | ||
type RequestError struct { | ||
statusCode int | ||
wrapped error | ||
} | ||
|
||
// NewRequestError creates a new HTTP Client Request error with the given parameters | ||
func NewRequestError(statusCode int, wrapped error) RequestError { | ||
return RequestError{ | ||
wrapped: wrapped, | ||
statusCode: statusCode, | ||
} | ||
} | ||
|
||
func (r RequestError) Error() string { | ||
return r.wrapped.Error() | ||
} | ||
|
||
func (r RequestError) Unwrap() error { | ||
return r.wrapped | ||
} | ||
|
||
func (r RequestError) StatusCode() int { | ||
return r.statusCode | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package colerrs | ||
|
||
import ( | ||
"errors" | ||
"net/http" | ||
"testing" | ||
|
||
"github.com/stretchr/testify/assert" | ||
) | ||
|
||
func TestNewRequestError(t *testing.T) { | ||
// prepare | ||
err := errors.New("boo") | ||
|
||
// test | ||
reqerr := NewRequestError(http.StatusInternalServerError, err) | ||
|
||
// verify | ||
assert.Error(t, reqerr) | ||
assert.Equal(t, err, reqerr.Unwrap()) | ||
assert.Equal(t, http.StatusInternalServerError, reqerr.StatusCode()) | ||
assert.Equal(t, "boo", reqerr.Error()) | ||
} |
jpkrohling marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package otlpreceiver // import "go.opentelemetry.io/collector/receiver/otlpreceiver" | ||
|
||
import ( | ||
"net/http" | ||
|
||
"google.golang.org/grpc/codes" | ||
"google.golang.org/grpc/status" | ||
) | ||
|
||
func gRPCToHTTP(s *status.Status) int { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an official guideline around translation from one protocol to another? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to find one, or at least a library we could use, but I couldn't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, after looking at it again, I recognized as it being my source of truth :-) I couldn't find a lib to reuse though. In any case, I should have documented the source of the data so that we are able to sync it in the future if we ever get new status codes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the record: I miss the mapping for 419. |
||
switch s.Code() { | ||
case codes.Aborted: | ||
return http.StatusInternalServerError | ||
case codes.AlreadyExists: | ||
return http.StatusConflict | ||
case codes.Canceled: | ||
return http.StatusInternalServerError | ||
case codes.DataLoss: | ||
return http.StatusInternalServerError | ||
jpkrohling marked this conversation as resolved.
Show resolved
Hide resolved
|
||
case codes.DeadlineExceeded: | ||
return http.StatusRequestTimeout | ||
case codes.FailedPrecondition: | ||
return http.StatusPreconditionFailed | ||
case codes.Internal: | ||
return http.StatusInternalServerError | ||
case codes.InvalidArgument: | ||
return http.StatusBadRequest | ||
case codes.NotFound: | ||
return http.StatusNotFound | ||
case codes.OutOfRange: | ||
return http.StatusBadRequest | ||
case codes.PermissionDenied: | ||
return http.StatusForbidden | ||
case codes.ResourceExhausted: | ||
return http.StatusInternalServerError | ||
case codes.Unauthenticated: | ||
return http.StatusUnauthorized | ||
case codes.Unavailable: | ||
return http.StatusServiceUnavailable | ||
case codes.Unimplemented: | ||
return http.StatusNotFound | ||
case codes.Unknown: | ||
return http.StatusInternalServerError | ||
default: | ||
return http.StatusInternalServerError | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that the gRPC code in the response body more accurately describes the error than the HTTP status code returned by the backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. The code for readResponse from a few lines above states:
Given that our OTLP receiver and exporter are sending and receiving proto payloads, I would say that the gRPC status codes are generally better at describing what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. However looking at the Failures section of the OTLP spec, it also says the following:
The way that reads to me implies that we may not want to use the code field for functional purposes like this. If we do use it, I think we should at least check if the code is a zero value given that it's optional, and we should probably also call this out in the exporter readme so the behavior is documented.