Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add outcome to transactions and spans #299

Merged
merged 9 commits into from
Aug 24, 2020
Merged
43 changes: 42 additions & 1 deletion docs/agents/agent-development.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,14 +237,33 @@ Global labels can be specified via the environment variable `ELASTIC_APM_GLOBAL_

### Transactions

#### Transaction outcome
felixbarny marked this conversation as resolved.
Show resolved Hide resolved

The `outcome` property denotes whether the transaction represents a success or a failure from the perspective of the entity that produced the event.
The APM Server converts this to the [`event.outcome`](https://www.elastic.co/guide/en/ecs/current/ecs-allowed-values-event-outcome.html) field.
felixbarny marked this conversation as resolved.
Show resolved Hide resolved
This property is optional to remain backwards compatibility.
felixbarny marked this conversation as resolved.
Show resolved Hide resolved
If an agent doesn't report the `outcome` (or reports `null`), the APM Server sets the outcome to `"unknown"`.
axw marked this conversation as resolved.
Show resolved Hide resolved

- `"failure"`: Indicates that this transaction describes a failed result. \
Note that client errors don't fall into this category as they are not an error from the perspective of the server.
felixbarny marked this conversation as resolved.
Show resolved Hide resolved
- `"success"`: Indicates that this transaction describes a successful result.
- `"unknown"`: Indicates that there's no information about the outcome.
This is the default value that applies when an outcome has not been set explicitly.
This may be the case when a user tracks a custom transaction without explicitly setting an outcome.
For existing auto-instrumentations, agents should set the outcome either to `"failure"` or `"success"`.

What counts as a failed or successful request depends on the protocol and does not depend on whether there are error documents associated with a transaction.

#### HTTP Transactions

Agents should instrument HTTP request routers/handlers, starting a new transaction for each incoming HTTP request. When the request ends, the transaction should be ended, recording its duration.

- The transaction `type` should be `request`.
- The transaction `result` should be `HTTP Nxx`, where N is the first digit of the status code (e.g. `HTTP 4xx` for a 404)
- The transaction `outcome` should be `"success"` for HTTP status codes < 500 and `"failure"` for status codes >= 500. \
Status codes in the 4xx range (client errors) are not considered a `failure` as the failure has not been caused by the application itself but by the caller.
basepi marked this conversation as resolved.
Show resolved Hide resolved
As there's no browser API to get the status code of a page load, the RUM agent always reports `"unknown"` for those transactions.
- The transaction `name` should be aggregatable, such as the route or handler name. Examples:

- `GET /users/{id}`
- `UsersController#index`

Expand Down Expand Up @@ -289,6 +308,21 @@ If a transaction is not sampled, you should set the `sampled: false` property an

The agent should also have a sense of the most common libraries for these and instrument them without any further setup from the app developers.

#### Span outcome
felixbarny marked this conversation as resolved.
Show resolved Hide resolved

The `outcome` property denotes whether the span represents a success or a failure.
It supports the same values as `transaction.outcome`.
The only semantic difference is that client errors set the `outcome` to `"failure"`.
Agents should try to determine the outcome for spans created by auto instrumentation,
which is especially important for exit spans.
felixbarny marked this conversation as resolved.
Show resolved Hide resolved

While the transaction outcome lets you reason about the error rate from the service's point of view,
other services might have a different perspective on that.
For example, if there's a network error so that service A can't call service B,
the error rate of service B is 100% from service A's perspective.
However, as service B doesn't receive any requests, the error rate is 0% from service B's perspective.
The `span.outcome` also allows reasoning about error rates of external services.

#### Span stack traces

Spans may have an associated stack trace, in order to locate the associated source code that caused the span to occur. If there are many spans being collected this can cause a significant amount of overhead in the application, due to the capture, rendering, and transmission of potentially large stack traces. It is possible to limit the recording of span stack traces to only spans that are slower than a specified duration, using the config variable `ELASTIC_APM_SPAN_FRAMES_MIN_DURATION`.
Expand Down Expand Up @@ -317,6 +351,8 @@ For outbound HTTP request spans we capture the following http-specific span cont
- `http.url` (the target URL)
- `http.status_code` (the response status code)

The span's `outcome` should be set to `"success"` if the status code is lower than 400 and to `"failure"` otherwise.

The captured URL should have the userinfo (username and password), if any, redacted.

#### Database spans
Expand Down Expand Up @@ -402,6 +438,11 @@ Agents should include exception handling in the instrumentation they provide, su

Errors may or may not occur within the context of a transaction or span. If they do, then they will be associated with them by recording the trace ID and transaction or span ID. This enables the APM UI to annotate traces with errors.

Tracking an error that's related to a transaction does not impact its `outcome`.
A transaction might have multiple errors associated to it but still return with a 2xx status code.
Hence, the status code is a more reliable signal for the outcome of the transaction.
This, in turn, means that the `outcome` is always specific to the protocol.

## Metrics

Agents periodically collect and report various metrics, described below.
Expand Down