Skip to content

Commit

Permalink
Updated and simplified Lambda spec
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderWert committed Jul 29, 2021
1 parent fd36c96 commit 38fd169
Showing 1 changed file with 77 additions and 93 deletions.
170 changes: 77 additions & 93 deletions specs/agents/tracing-instrumentation-aws-lambda.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,35 @@ In our instrumentation we use these objects to derive useful meta and context in
## Generic Lambda Instrumentation
In general, to instrument Lambda functions, we create transactions that wrap the execution of that handler method. In cases where we cannot assume any type or information in the `event` object (e.g. if the trigger type is undefined), we ignore trigger-specific information and simply wrap the handler method with a transaction, while using the `context` object to derive some necessary fields.

Field | Value | Description
--- | --- | ---
`transaction.name` | e.g. `MyFunctionName` | The name of the Lambda function. This can be retrieved either from the `context` object or from the environment variable `AWS_LAMBDA_FUNCTION_NAME`
`transaction.type` | `request` |
`faas.id` | e.g. `arn:aws:lambda:us-west-2:123456789012:function:my-function` | Use `context.invokedFunctionArn` to set the id.
`faas.coldstart` | `true` / `false` | Boolean value indicating whether a Lambda function invocation was a cold start or not.
`faas.trigger.type`| `other` | The trigger type. Use `other` if trigger type is unknown / cannot be specified. More concrete triggers are `http`, `pubsub`, `datasource`, `timer` (see specific triggers below).
Field | Value | Description | Source
--- | --- | --- | ---
`service.name` | e.g. `MyFunctionName` | The name of the Lambda function. | `AWS_LAMBDA_FUNCTION_NAME` or `context.functionName`
`service.framework.name` | `AWS Lambda` | Constant value for the framework name. | -
`service.runtime.name`| e.g. `AWS_Lambda_java8` | The lambda runtime. | `AWS_EXECUTION_ENV`
`service.id` | e.g. `arn:aws:lambda:us-west-2:123456789012:function:my-function` | The ARN of the function **without alias postfix**. | `context.invokedFunctionArn`, remove the 8th ARN segment if the ARN contains an alias postfix.
`service.version` | e.g. `${LATEST}` | The lambda function version | `AWS_LAMBDA_FUNCTION_VERSION` or `context.functionVersion`
`service.node.name` | e.g. `2019/06/07/[$LATEST]e6f...` | The log stream name uniquely identifying a function instance. | `AWS_LAMBDA_LOG_STREAM_NAME` or `context.logStreamName`
`cloud.provider` | `aws` | Constant value for the cloud provider. |
`source.cloud.provider` | `aws` | Constant value for the source cloud provider. |
`cloud.region` | e.g. `us-east-1` | The cloud region. | `AWS_REGION`
`cloud.service.name` | `lambda` | Constant value for the AWS service.
`transaction.type` | `request` | - | -
`faas.trigger.type` | `other` | The trigger type. Use `other` if trigger type is unknown / cannot be specified. | More concrete triggers are `http`, `pubsub`, `datasource`, `timer` (see specific triggers below).
`faas.execution` | `af9aa4-a6...` | The AWS request ID of the function invocation | `context.awsRequestId`
`faas.coldstart` | `true` / `false` | Boolean value indicating whether a Lambda function invocation was a cold start or not. | [see section below](deriving-cold-starts)
`transaction.name` | e.g. `MyFunctionName` | Use function name if trigger type is `other`. | `context.functionName`
`faas.trigger.reuqest_id` | - | Do not set this field if trigger type is `other`. | Trigger specific.
`source.service.*` | - | Do not set these fields if trigger type is `other`. | Trigger specific.
`source.cloud.*` | - | Do not set these fields if trigger type is `other`. | Trigger specific.

### Overwriting Metadata
Automatically capturing cloud metadata doesn't work reliably from a Lambda environment. Moreover, retrieving cloud metadata through an additional HTTP request may slowdown the lambda function / increase cold start behaviour. Therefore, the generic cloud metadata fetching should be disabled when the agent is running in a lambda context (for instance through checking for the existance of the `AWS_LAMBDA_FUNCTION_NAME` environment variable). For AWS Lambda we adapt meta-data fetching that uses available environment variables to derive / overwrite the following fields:

Field | Value | Description
--- | --- | ---
`service.name` | e.g. `MyFunctionName` | The name of the Lambda function. Retrieved from the environment variable `AWS_LAMBDA_FUNCTION_NAME`.
`service.framework.name` | `AWS Lambda` | Contant value for the framework name.
`service.runtime.name`| e.g. `AWS_Lambda_java8` | The lambda runtime derived from the `AWS_EXECUTION_ENV` environment variable.
`cloud.provider` | `aws` | Contant value for the cloud provider.
`cloud.region` | e.g. `us-east-1` | The cloud region derived from the `AWS_REGION` environment variable.
`cloud.service.name` | `lambda` | The AWS service which is the value `lambda` for this instrumentation.
`faas.name` | e.g. `MyFunctionName` | The lambda function name derived from the `AWS_LAMBDA_FUNCTION_NAME` environment variable.
`faas.version`| e.g. `${LATEST}` | The lambda function version derived from the `AWS_LAMBDA_FUNCTION_VERSION` environment variable.

Automatically capturing cloud metadata doesn't work reliably from a Lambda environment. Moreover, retrieving cloud metadata through an additional HTTP request may slowdown the lambda function / increase cold start behaviour. Therefore, the generic cloud metadata fetching should be disabled when the agent is running in a lambda context (for instance through checking for the existance of the `AWS_LAMBDA_FUNCTION_NAME` environment variable).
Where possible, metadata should be overwritten at Lambda runtime startup corresponding to the field specifications in this spec.
Some metadata fields are not available at startup (e.g. `invokedFunctionArn`). In this case corresponding fields need to be set either as metadata with the first invocation of the lambda function or set for every lambda invocation as transaction fields.

### Deriving cold starts
A cold start occurs if AWS needs first to initialize the Lambda runtime (including the Lambda process, such as JVM, Node.js process, etc.) in oder to handle a request. This happens for the first request and after long function idle times. Lambda functions are always called sequentially (there is no concurrency). Thus, detecting a cold start is as simple as detecting whether the invocation of a __handler method__ is the **first since process startup** or not. This can be achieved with a global / process-scoped flag that is flipped at the first execution of the handler method.



## Trigger-specific Instrumentation
Lambda functions can be triggered in many different ways. A generic transaction for a Lambda invocation can be create independently of the actual trigger. However, depending on the trigger type different information might be available that can be used to capture additional transaction data or allows to derive additional, valuable spans. The most common triggers that we want dedicated instrumentation support for are the following:

Expand All @@ -59,17 +60,17 @@ The agent should use the information in the request and response objects to fill
In particular, agents must use the `event.headers` to retrieve the `traceparent` and the `tracestate` and use them to start the transaction for the lambda function execution.

In addition the following fields should be set for API Gateway-based Lambda functions:
Field | Value | Description
--- | --- | ---
`faas.trigger.type` | `http` |
`faas.trigger.id` | e.g. `gy415nuibc` | Use `event.requestContext.apiId`
`faas.trigger.name` | e.g. `POST /{proxy+}/Prod` | Format: `${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage}`
`faas.trigger.account.id` | e.g. `12345678912` | Use `event.requestContext.accountId`
`faas.trigger.version` | `1.0` or `2.0` | `1.0` for API Gateway V1, `2.0` for API Gateway V2
`faas.execution` | e.g. `123456789` | Use `event.requestContext.requestId`

With both API Gateway versions the `event` objects (may) contain a timestamp of the original API Gateway request.
This information should be used to [adapt the transaction and spans structure](#init-spans) of the Lambda invocation to represent the initialization time.

Field | Value | Description | Source
--- | --- | --- | ---
`faas.trigger.type` | `http` | Constant value for API gateway. | -
`transaction.name` | e.g. `GET MyFunction` | Http method followed by a whitespace and the function name. | -
`faas.trigger.reuqest_id` | e.g. `afa4-a6...` | ID of the API gateway request. | `event.requestContext.requestId`
`source.service.name` | e.g. `POST /{proxy+}/Prod` | Readable API gateway endpoint. |Format: `${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage}`
`source.service.id` | e.g. `gy415nu...` | `event.requestContext.apiId` |
`source.service.version` | e.g. `1.0` | `1.0` for API Gateway V1, `2.0` for API Gateway V2. | -
`source.cloud.service.name` | `api gateway` | Fix value for API gateway. | -
`source.cloud.account.id` | e.g. `12345678912` | Account ID of the API gateway. | `event.requestContext.accountId`

### SQS / SNS
Lambda functions that are triggered by SQS (or SNS) accept an `event` input that may contain one ore more SQS / SNS messages in the `event.records` array. All message-related context information (including the `traceparent`) is encoded in the individual message attributes (if at all). We cannot (automatically) wrap the processing of the individual messages that are sent as a batch of messages with a single `event`.
Expand All @@ -79,74 +80,57 @@ Thus, in case that an SQS / SNS `event` contains **exactly one** SQS / SNS messa
With only one message in `event.records`, the agents can use the single SQS / SNS `record` to retrieve the `traceparent` and `tracestate` from `record.messageAttributes` and use it for starting the lambda transaction.

In addition the following fields should be set for Lambda functions triggered by SQS or SNS:
Field | Value | SQS | SNS
--- | --- | --- | ---
`faas.trigger.type` | `pubsub` | Constant value for message based triggers | "
`faas.trigger.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | Use `record.eventSourceArn` |Use `record.sns.topicArn`
`faas.trigger.name` | e.g. `my-queue` | Parse the queue / topic name (6th element) from the ARN used for `faas.trigger.id`. | "
`faas.trigger.account.id` | e.g. `12345678912` | Parse account segment (5th) from the ARN used for `faas.trigger.id` | "
`faas.trigger.region` | e.g. `us-east-1` | Use `record.awsRegion` | Parse region segment (4th) from `record.sns.topicArn`
`faas.trigger.version` | e.g. `2.1` | Use `record.eventVersion` | not available
`faas.execution` | e.g. `someMessageId` | Use `record.messageId` | Use `record.sns.messageId`
`context.message.queue` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | Use `record.eventSourceArn` | Use `record.sns.topicArn`
`context.message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available. </br></br> For SQS, the timestamp can be retrieved from message attributes with key `SentTimestamp`. | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available. </br></br> For SNS, the timestamp can be retrieved from `record.sns.timestamp`.
`context.message.body` | | The message body. Sould only be captured if body capturing is enabled in the configuration.</br></br> Use `record.body` | The message body. Sould only be captured if body capturing is enabled in the configuration.</br></br> Use `record.sns.message`
`context.message.headers` | | The message attributes. Should only be captured, if capturing headers is enabled in the configuration.</br></br> Use `record.messageAttributes`| The message attributes. Should only be captured, if capturing headers is enabled in the configuration.</br></br> Use `record.sns.messageAttributes`

#### SQS
Field | Value | Description | Source
--- | --- | --- | ---
`faas.trigger.type` | `pubsub` | Constant value for message based triggers | -
`transaction.name` | e.g. `RECEIVE SomeQueue` | Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
`faas.trigger.reuqest_id` | e.g. `someMessageId` | SQS message ID. | `record.messageId`
`source.service.name` | e.g. `my-queue` | SQS queue name | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
`source.service.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `record.eventSourceArn`
`source.service.version` | e.g. `2.1` | SQS event version | `record.eventVersion`
`source.cloud.service.name` | `sqs` | Fix value for SQS. | -
`source.cloud.region` | e.g. `us-east-1` | SQS queue region. | `record.awsRegion`
`source.cloud.account.id` | e.g. `12345678912` | Account ID of the SQS queue. | Parse account segment (5th) from `record.eventSourceArn`.
`message.queue` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `record.eventSourceArn`
`message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available. | Message attribute with key `SentTimestamp`.
`message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.body`
`message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | `record.messageAttributes`

#### SNS
Field | Value | Description | Source
--- | --- | --- | ---
`faas.trigger.type` | `pubsub` | Constant value for message based triggers | -
`transaction.name` | e.g. `RECEIVE SomeTopic` | Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
`faas.trigger.reuqest_id` | e.g. `someMessageId` | SNS message ID. | `record.sns.messageId`
`source.service.name` | e.g. `my-topic` | SNS topic name | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
`source.service.id` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `record.sns.topicArn`
`source.cloud.service.name` | `sns` | Fix value for SNS. | -
`source.cloud.region` | e.g. `us-east-1` | SNS topic region. | Parse region segment (4th) from `record.sns.topicArn`.
`source.cloud.account.id` | e.g. `12345678912` | Account ID of the SNS topic. | Parse account segment (5th) from `record.sns.topicArn`.
`message.queue` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `record.sns.topicArn`
`message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `snsTimestamp`. | `record.sns.timestamp`
`message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.sns.message`
`message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | `record.sns.messageAttributes`

### S3
Lambda functions that are triggered by S3 accept an `event` input that may contain one ore more `S3 event notification records` in the `event.records` array. We cannot (automatically) wrap the processing of the individual records that are sent as a batch of S3 event notification records with a single `event`.

Thus, in case that an S3 `event` contains **exactly one** `S3 event notification record`, the agents must apply the following, S3-specific retrieval of information. Otherwise, the agents should apply the [Generic Lambda Instrumentation](generic-lambda-instrumentation) as desribed above.

In addition the following fields should be set for Lambda functions triggered by S3:
Field | Value | Description
--- | --- | ---
`faas.trigger.type` | `datasource` | Constant value for message based triggers.
`faas.trigger.name` | e.g. `mybucket` | Use `record.s3.bucket.name`
`faas.trigger.id` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put` | Use `record.s3.bucket.arn`
`faas.trigger.region` | e.g. `us-east-1` | Use `record.awsRegion`
`faas.trigger.version` | e.g. `2.1` | Use `record.eventVersion`
`faas.execution` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put` | Format: `${record.s3.bucket.arn}/${record.eventName}`


## Init spans
With both API Gateway versions the `event` objects (may) contain a timestamp of the original API Gateway request. The difference between the API Gateway timestamp and the actual start timestamp of the Lambda function is a good estimate for the initialization phase of the lambda function.
The agents should use this information (if available) to adapt the transaction and spans structure of the Lambda invocation to represent the initialization time:

```
...
[XXXXXXXXXXXXX POST api.gateway XXXXXXXXXXXXXXXXXXXXX] // client calling the API Gateway
[XXXXXXXXXX MyLambdaFunction XXXXXXXXXXXXXXXXXXXX] // Lambda Transaction
[XXX Lambda init XXXXX] // Init span
[XXXX Lambda handler XXXX] // span for the handler method
[XXXXXXXXXX] // any Lambda subspans
^ ^
| |
API Gateway Timestamp of the
RequestTimestamp handler method start
```

To achieve the above transaction / span structure we need to backdate the begin of the created transaction using the API Gateway timestamp. We create a artificial span `Lambda init` using the API Gateway timestamp as start and the handler method timestamp as end. The `Lambda handler` span represents the actual execution of the Lambda handler method.

### Retrieving the API Gateway V1 timestamp
With API Gateway version 1 there is no dedicated request timestamp field.
However, AWS adds a X-Ray tracing ID to the headers which encodes a request timestamp in **seconds**.
Though the resolution of that timestamp is very coarse grained, in some cases it still can provide at least a rough estimate of the init phase (if the init phase is very long).

The agents should use this timestamp and apply the above transaction structure if:
- the corresponding header is available
- AND: `x_ray_api_gateway_request_timestamp` + 1 second < `handler_method_start_timestamp`

The X-Ray timestamp can be retrieved from the HTTP header `X-Amzn-Trace-Id`. The trace id has the following format:
```
1-58406520-a006649127e371903a2de979
```

The second segment (here: `58406520`) is a hexadecimal encoded timestamp in seconds.

### Retrieving the API Gateway V2 timestamp
With API Gateway version 2 the passed `event` object contains a field `timeEpoch` field (under `request context`) that denotes the request timestamp in milliseconds.

Field | Value | Description | Source
--- | --- | --- | ---
`faas.trigger.type` | `datasource` | Constant value. | -
`transaction.name` | e.g. `MyFunction` | Use function name. | `context.functionName`
`faas.trigger.reuqest_id` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put/` | S3 event identifier. | `${record.s3.bucket.arn}/${record.eventName}/${record.eventTime}`
`source.service.name` | e.g. `mybucket` | S3 bucket name. | `record.s3.bucket.name`
`source.service.id` | e.g. `arn:aws:s3:::mybucket` | S3 bucket ARN. | `record.s3.bucket.arn`
`source.cloud.service.name` | `s3` | Fix value for S3. | -
`source.cloud.region` | e.g. `us-east-1` | S3 bucket region. | `record.awsRegion`
`source.service.version` | e.g. `2.1` | S3 schema version. | `record.s3.s3SchemaVersion`

## Data Flushing
Lambda functions are immediately frozen as soon as the handler method ends. In case APM data is sent in an asyncronous way (as most of the agents do by default) data can get lost if not sent before the lambda function ends.
Expand Down

0 comments on commit 38fd169

Please sign in to comment.