elastic · felixbarny · Aug 24, 2020 · Jul 17, 2020 · Jul 17, 2020 · Jul 29, 2020
diff --git a/docs/agents/agent-development.md b/docs/agents/agent-development.md
@@ -237,14 +237,33 @@ Global labels can be specified via the environment variable `ELASTIC_APM_GLOBAL_
 
 ### Transactions
 
+#### Transaction outcome
+
+The `outcome` property denotes whether the transaction represents a success or a failure from the perspective of the entity that produced the event.
+The APM Server converts this to the [`event.outcome`](https://www.elastic.co/guide/en/ecs/current/ecs-allowed-values-event-outcome.html) field.
+This property is optional to remain backwards compatibility.
+If an agent doesn't report the `outcome` (or reports `null`), the APM Server sets the outcome to `"unknown"`.
+
+- `"failure"`: Indicates that this transaction describes a failed result. \
+  Note that client errors don't fall into this category as they are not an error from the perspective of the server.
+- `"success"`: Indicates that this transaction describes a successful result.
+- `"unknown"`: Indicates that there's no information about the outcome.
+  This is the default value that applies when an outcome has not been set explicitly.
+  This may be the case when a user tracks a custom transaction without explicitly setting an outcome.
+  For existing auto-instrumentations, agents should set the outcome either to `"failure"` or `"success"`.
+
+What counts as a failed or successful request depends on the protocol and does not depend on whether there are error documents associated with a transaction.
+
 #### HTTP Transactions
 
 Agents should instrument HTTP request routers/handlers, starting a new transaction for each incoming HTTP request. When the request ends, the transaction should be ended, recording its duration.
 
 - The transaction `type` should be `request`.
 - The transaction `result` should be `HTTP Nxx`, where N is the first digit of the status code (e.g. `HTTP 4xx` for a 404)
+- The transaction `outcome` should be `"success"` for HTTP status codes < 500 and `"failure"` for status codes >= 500. \
+  Status codes in the 4xx range (client errors) are not considered a `failure` as the failure has not been caused by the application itself but by the caller.
+  As there's no browser API to get the status code of a page load, the RUM agent always reports `"unknown"` for those transactions.
 - The transaction `name` should be aggregatable, such as the route or handler name. Examples:
-
     - `GET /users/{id}`
     - `UsersController#index`
 
@@ -289,6 +308,21 @@ If a transaction is not sampled, you should set the `sampled: false` property an
 
 The agent should also have a sense of the most common libraries for these and instrument them without any further setup from the app developers.
 
+#### Span outcome
+
+The `outcome` property denotes whether the span represents a success or a failure.
+It supports the same values as `transaction.outcome`.
+The only semantic difference is that client errors set the `outcome` to `"failure"`.
+Agents should try to determine the outcome for spans created by auto instrumentation,
+which is especially important for exit spans.
+
+While the transaction outcome lets you reason about the error rate from the service's point of view,
+other services might have a different perspective on that.
+For example, if there's a network error so that service A can't call service B,
+the error rate of service B is 100% from service A's perspective.
+However, as service B doesn't receive any requests, the error rate is 0% from service B's perspective.
+The `span.outcome` also allows reasoning about error rates of external services.
+
 #### Span stack traces
 
 Spans may have an associated stack trace, in order to locate the associated source code that caused the span to occur. If there are many spans being collected this can cause a significant amount of overhead in the application, due to the capture, rendering, and transmission of potentially large stack traces. It is possible to limit the recording of span stack traces to only spans that are slower than a specified duration, using the config variable `ELASTIC_APM_SPAN_FRAMES_MIN_DURATION`.
@@ -317,6 +351,8 @@ For outbound HTTP request spans we capture the following http-specific span cont
 - `http.url` (the target URL)
 - `http.status_code` (the response status code)
 
+The span's `outcome` should be set to `"success"` if the status code is lower than 400 and to `"failure"` otherwise. 
+
 The captured URL should have the userinfo (username and password), if any, redacted.
 
 #### Database spans
@@ -402,6 +438,11 @@ Agents should include exception handling in the instrumentation they provide, su
 
 Errors may or may not occur within the context of a transaction or span. If they do, then they will be associated with them by recording the trace ID and transaction or span ID. This enables the APM UI to annotate traces with errors.
 
+Tracking an error that's related to a transaction does not impact its `outcome`.
+A transaction might have multiple errors associated to it but still return with a 2xx status code.
+Hence, the status code is a more reliable signal for the outcome of the transaction.
+This, in turn, means that the `outcome` is always specific to the protocol.
+
 ## Metrics
 
 Agents periodically collect and report various metrics, described below.