diff --git a/specs/agents/tracing-instrumentation-aws-lambda.md b/specs/agents/tracing-instrumentation-aws-lambda.md index 9a8da1f6..8be7ae99 100644 --- a/specs/agents/tracing-instrumentation-aws-lambda.md +++ b/specs/agents/tracing-instrumentation-aws-lambda.md @@ -10,34 +10,35 @@ In our instrumentation we use these objects to derive useful meta and context in ## Generic Lambda Instrumentation In general, to instrument Lambda functions, we create transactions that wrap the execution of that handler method. In cases where we cannot assume any type or information in the `event` object (e.g. if the trigger type is undefined), we ignore trigger-specific information and simply wrap the handler method with a transaction, while using the `context` object to derive some necessary fields. -Field | Value | Description ---- | --- | --- -`transaction.name` | e.g. `MyFunctionName` | The name of the Lambda function. This can be retrieved either from the `context` object or from the environment variable `AWS_LAMBDA_FUNCTION_NAME` -`transaction.type` | `request` | -`faas.id` | e.g. `arn:aws:lambda:us-west-2:123456789012:function:my-function` | Use `context.invokedFunctionArn` to set the id. -`faas.coldstart` | `true` / `false` | Boolean value indicating whether a Lambda function invocation was a cold start or not. -`faas.trigger.type`| `other` | The trigger type. Use `other` if trigger type is unknown / cannot be specified. More concrete triggers are `http`, `pubsub`, `datasource`, `timer` (see specific triggers below). +Field | Value | Description | Source +--- | --- | --- | --- +`service.name` | e.g. `MyFunctionName` | The name of the Lambda function. | `AWS_LAMBDA_FUNCTION_NAME` or `context.functionName` +`service.framework.name` | `AWS Lambda` | Constant value for the framework name. | - +`service.runtime.name`| e.g. `AWS_Lambda_java8` | The lambda runtime. | `AWS_EXECUTION_ENV` +`service.id` | e.g. `arn:aws:lambda:us-west-2:123456789012:function:my-function` | The ARN of the function **without alias postfix**. | `context.invokedFunctionArn`, remove the 8th ARN segment if the ARN contains an alias postfix. +`service.version` | e.g. `${LATEST}` | The lambda function version | `AWS_LAMBDA_FUNCTION_VERSION` or `context.functionVersion` +`service.node.name` | e.g. `2019/06/07/[$LATEST]e6f...` | The log stream name uniquely identifying a function instance. | `AWS_LAMBDA_LOG_STREAM_NAME` or `context.logStreamName` +`cloud.provider` | `aws` | Constant value for the cloud provider. | +`source.cloud.provider` | `aws` | Constant value for the source cloud provider. | +`cloud.region` | e.g. `us-east-1` | The cloud region. | `AWS_REGION` +`cloud.service.name` | `lambda` | Constant value for the AWS service. +`transaction.type` | `request` | - | - +`faas.trigger.type` | `other` | The trigger type. Use `other` if trigger type is unknown / cannot be specified. | More concrete triggers are `http`, `pubsub`, `datasource`, `timer` (see specific triggers below). +`faas.execution` | `af9aa4-a6...` | The AWS request ID of the function invocation | `context.awsRequestId` +`faas.coldstart` | `true` / `false` | Boolean value indicating whether a Lambda function invocation was a cold start or not. | [see section below](deriving-cold-starts) +`transaction.name` | e.g. `MyFunctionName` | Use function name if trigger type is `other`. | `context.functionName` +`faas.trigger.reuqest_id` | - | Do not set this field if trigger type is `other`. | Trigger specific. +`source.service.*` | - | Do not set these fields if trigger type is `other`. | Trigger specific. +`source.cloud.*` | - | Do not set these fields if trigger type is `other`. | Trigger specific. ### Overwriting Metadata -Automatically capturing cloud metadata doesn't work reliably from a Lambda environment. Moreover, retrieving cloud metadata through an additional HTTP request may slowdown the lambda function / increase cold start behaviour. Therefore, the generic cloud metadata fetching should be disabled when the agent is running in a lambda context (for instance through checking for the existance of the `AWS_LAMBDA_FUNCTION_NAME` environment variable). For AWS Lambda we adapt meta-data fetching that uses available environment variables to derive / overwrite the following fields: - -Field | Value | Description ---- | --- | --- -`service.name` | e.g. `MyFunctionName` | The name of the Lambda function. Retrieved from the environment variable `AWS_LAMBDA_FUNCTION_NAME`. -`service.framework.name` | `AWS Lambda` | Contant value for the framework name. -`service.runtime.name`| e.g. `AWS_Lambda_java8` | The lambda runtime derived from the `AWS_EXECUTION_ENV` environment variable. -`cloud.provider` | `aws` | Contant value for the cloud provider. -`cloud.region` | e.g. `us-east-1` | The cloud region derived from the `AWS_REGION` environment variable. -`cloud.service.name` | `lambda` | The AWS service which is the value `lambda` for this instrumentation. -`faas.name` | e.g. `MyFunctionName` | The lambda function name derived from the `AWS_LAMBDA_FUNCTION_NAME` environment variable. -`faas.version`| e.g. `${LATEST}` | The lambda function version derived from the `AWS_LAMBDA_FUNCTION_VERSION` environment variable. - +Automatically capturing cloud metadata doesn't work reliably from a Lambda environment. Moreover, retrieving cloud metadata through an additional HTTP request may slowdown the lambda function / increase cold start behaviour. Therefore, the generic cloud metadata fetching should be disabled when the agent is running in a lambda context (for instance through checking for the existance of the `AWS_LAMBDA_FUNCTION_NAME` environment variable). +Where possible, metadata should be overwritten at Lambda runtime startup corresponding to the field specifications in this spec. +Some metadata fields are not available at startup (e.g. `invokedFunctionArn`). In this case corresponding fields need to be set either as metadata with the first invocation of the lambda function or set for every lambda invocation as transaction fields. ### Deriving cold starts A cold start occurs if AWS needs first to initialize the Lambda runtime (including the Lambda process, such as JVM, Node.js process, etc.) in oder to handle a request. This happens for the first request and after long function idle times. Lambda functions are always called sequentially (there is no concurrency). Thus, detecting a cold start is as simple as detecting whether the invocation of a __handler method__ is the **first since process startup** or not. This can be achieved with a global / process-scoped flag that is flipped at the first execution of the handler method. - - ## Trigger-specific Instrumentation Lambda functions can be triggered in many different ways. A generic transaction for a Lambda invocation can be create independently of the actual trigger. However, depending on the trigger type different information might be available that can be used to capture additional transaction data or allows to derive additional, valuable spans. The most common triggers that we want dedicated instrumentation support for are the following: @@ -59,17 +60,17 @@ The agent should use the information in the request and response objects to fill In particular, agents must use the `event.headers` to retrieve the `traceparent` and the `tracestate` and use them to start the transaction for the lambda function execution. In addition the following fields should be set for API Gateway-based Lambda functions: -Field | Value | Description ---- | --- | --- -`faas.trigger.type` | `http` | -`faas.trigger.id` | e.g. `gy415nuibc` | Use `event.requestContext.apiId` -`faas.trigger.name` | e.g. `POST /{proxy+}/Prod` | Format: `${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage}` -`faas.trigger.account.id` | e.g. `12345678912` | Use `event.requestContext.accountId` -`faas.trigger.version` | `1.0` or `2.0` | `1.0` for API Gateway V1, `2.0` for API Gateway V2 -`faas.execution` | e.g. `123456789` | Use `event.requestContext.requestId` - -With both API Gateway versions the `event` objects (may) contain a timestamp of the original API Gateway request. -This information should be used to [adapt the transaction and spans structure](#init-spans) of the Lambda invocation to represent the initialization time. + +Field | Value | Description | Source +--- | --- | --- | --- +`faas.trigger.type` | `http` | Constant value for API gateway. | - +`transaction.name` | e.g. `GET MyFunction` | Http method followed by a whitespace and the function name. | - +`faas.trigger.reuqest_id` | e.g. `afa4-a6...` | ID of the API gateway request. | `event.requestContext.requestId` +`source.service.name` | e.g. `POST /{proxy+}/Prod` | Readable API gateway endpoint. |Format: `${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage}` +`source.service.id` | e.g. `gy415nu...` | `event.requestContext.apiId` | +`source.service.version` | e.g. `1.0` | `1.0` for API Gateway V1, `2.0` for API Gateway V2. | - +`source.cloud.service.name` | `api gateway` | Fix value for API gateway. | - +`source.cloud.account.id` | e.g. `12345678912` | Account ID of the API gateway. | `event.requestContext.accountId` ### SQS / SNS Lambda functions that are triggered by SQS (or SNS) accept an `event` input that may contain one ore more SQS / SNS messages in the `event.records` array. All message-related context information (including the `traceparent`) is encoded in the individual message attributes (if at all). We cannot (automatically) wrap the processing of the individual messages that are sent as a batch of messages with a single `event`. @@ -79,20 +80,39 @@ Thus, in case that an SQS / SNS `event` contains **exactly one** SQS / SNS messa With only one message in `event.records`, the agents can use the single SQS / SNS `record` to retrieve the `traceparent` and `tracestate` from `record.messageAttributes` and use it for starting the lambda transaction. In addition the following fields should be set for Lambda functions triggered by SQS or SNS: -Field | Value | SQS | SNS ---- | --- | --- | --- -`faas.trigger.type` | `pubsub` | Constant value for message based triggers | " -`faas.trigger.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | Use `record.eventSourceArn` |Use `record.sns.topicArn` -`faas.trigger.name` | e.g. `my-queue` | Parse the queue / topic name (6th element) from the ARN used for `faas.trigger.id`. | " -`faas.trigger.account.id` | e.g. `12345678912` | Parse account segment (5th) from the ARN used for `faas.trigger.id` | " -`faas.trigger.region` | e.g. `us-east-1` | Use `record.awsRegion` | Parse region segment (4th) from `record.sns.topicArn` -`faas.trigger.version` | e.g. `2.1` | Use `record.eventVersion` | not available -`faas.execution` | e.g. `someMessageId` | Use `record.messageId` | Use `record.sns.messageId` -`context.message.queue` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | Use `record.eventSourceArn` | Use `record.sns.topicArn` -`context.message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available.

For SQS, the timestamp can be retrieved from message attributes with key `SentTimestamp`. | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available.

For SNS, the timestamp can be retrieved from `record.sns.timestamp`. -`context.message.body` | | The message body. Sould only be captured if body capturing is enabled in the configuration.

Use `record.body` | The message body. Sould only be captured if body capturing is enabled in the configuration.

Use `record.sns.message` -`context.message.headers` | | The message attributes. Should only be captured, if capturing headers is enabled in the configuration.

Use `record.messageAttributes`| The message attributes. Should only be captured, if capturing headers is enabled in the configuration.

Use `record.sns.messageAttributes` +#### SQS +Field | Value | Description | Source +--- | --- | --- | --- +`faas.trigger.type` | `pubsub` | Constant value for message based triggers | - +`transaction.name` | e.g. `RECEIVE SomeQueue` | Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`. +`faas.trigger.reuqest_id` | e.g. `someMessageId` | SQS message ID. | `record.messageId` +`source.service.name` | e.g. `my-queue` | SQS queue name | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`. +`source.service.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `record.eventSourceArn` +`source.service.version` | e.g. `2.1` | SQS event version | `record.eventVersion` +`source.cloud.service.name` | `sqs` | Fix value for SQS. | - +`source.cloud.region` | e.g. `us-east-1` | SQS queue region. | `record.awsRegion` +`source.cloud.account.id` | e.g. `12345678912` | Account ID of the SQS queue. | Parse account segment (5th) from `record.eventSourceArn`. +`message.queue` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `record.eventSourceArn` +`message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available. | Message attribute with key `SentTimestamp`. +`message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.body` +`message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | `record.messageAttributes` + +#### SNS +Field | Value | Description | Source +--- | --- | --- | --- +`faas.trigger.type` | `pubsub` | Constant value for message based triggers | - +`transaction.name` | e.g. `RECEIVE SomeTopic` | Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`. +`faas.trigger.reuqest_id` | e.g. `someMessageId` | SNS message ID. | `record.sns.messageId` +`source.service.name` | e.g. `my-topic` | SNS topic name | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`. +`source.service.id` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `record.sns.topicArn` +`source.cloud.service.name` | `sns` | Fix value for SNS. | - +`source.cloud.region` | e.g. `us-east-1` | SNS topic region. | Parse region segment (4th) from `record.sns.topicArn`. +`source.cloud.account.id` | e.g. `12345678912` | Account ID of the SNS topic. | Parse account segment (5th) from `record.sns.topicArn`. +`message.queue` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `record.sns.topicArn` +`message.age` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `snsTimestamp`. | `record.sns.timestamp` +`message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.sns.message` +`message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | `record.sns.messageAttributes` ### S3 Lambda functions that are triggered by S3 accept an `event` input that may contain one ore more `S3 event notification records` in the `event.records` array. We cannot (automatically) wrap the processing of the individual records that are sent as a batch of S3 event notification records with a single `event`. @@ -100,53 +120,17 @@ Lambda functions that are triggered by S3 accept an `event` input that may conta Thus, in case that an S3 `event` contains **exactly one** `S3 event notification record`, the agents must apply the following, S3-specific retrieval of information. Otherwise, the agents should apply the [Generic Lambda Instrumentation](generic-lambda-instrumentation) as desribed above. In addition the following fields should be set for Lambda functions triggered by S3: -Field | Value | Description ---- | --- | --- -`faas.trigger.type` | `datasource` | Constant value for message based triggers. -`faas.trigger.name` | e.g. `mybucket` | Use `record.s3.bucket.name` -`faas.trigger.id` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put` | Use `record.s3.bucket.arn` -`faas.trigger.region` | e.g. `us-east-1` | Use `record.awsRegion` -`faas.trigger.version` | e.g. `2.1` | Use `record.eventVersion` -`faas.execution` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put` | Format: `${record.s3.bucket.arn}/${record.eventName}` - - -## Init spans -With both API Gateway versions the `event` objects (may) contain a timestamp of the original API Gateway request. The difference between the API Gateway timestamp and the actual start timestamp of the Lambda function is a good estimate for the initialization phase of the lambda function. -The agents should use this information (if available) to adapt the transaction and spans structure of the Lambda invocation to represent the initialization time: - -``` -... - [XXXXXXXXXXXXX POST api.gateway XXXXXXXXXXXXXXXXXXXXX] // client calling the API Gateway - [XXXXXXXXXX MyLambdaFunction XXXXXXXXXXXXXXXXXXXX] // Lambda Transaction - [XXX Lambda init XXXXX] // Init span - [XXXX Lambda handler XXXX] // span for the handler method - [XXXXXXXXXX] // any Lambda subspans - ^ ^ - | | -API Gateway Timestamp of the -RequestTimestamp handler method start -``` - -To achieve the above transaction / span structure we need to backdate the begin of the created transaction using the API Gateway timestamp. We create a artificial span `Lambda init` using the API Gateway timestamp as start and the handler method timestamp as end. The `Lambda handler` span represents the actual execution of the Lambda handler method. - -### Retrieving the API Gateway V1 timestamp -With API Gateway version 1 there is no dedicated request timestamp field. -However, AWS adds a X-Ray tracing ID to the headers which encodes a request timestamp in **seconds**. -Though the resolution of that timestamp is very coarse grained, in some cases it still can provide at least a rough estimate of the init phase (if the init phase is very long). - -The agents should use this timestamp and apply the above transaction structure if: -- the corresponding header is available -- AND: `x_ray_api_gateway_request_timestamp` + 1 second < `handler_method_start_timestamp` - -The X-Ray timestamp can be retrieved from the HTTP header `X-Amzn-Trace-Id`. The trace id has the following format: -``` -1-58406520-a006649127e371903a2de979 -``` - -The second segment (here: `58406520`) is a hexadecimal encoded timestamp in seconds. - -### Retrieving the API Gateway V2 timestamp -With API Gateway version 2 the passed `event` object contains a field `timeEpoch` field (under `request context`) that denotes the request timestamp in milliseconds. + +Field | Value | Description | Source +--- | --- | --- | --- +`faas.trigger.type` | `datasource` | Constant value. | - +`transaction.name` | e.g. `MyFunction` | Use function name. | `context.functionName` +`faas.trigger.reuqest_id` | e.g. `arn:aws:s3:::mybucket/ObjectCreated:Put/` | S3 event identifier. | `${record.s3.bucket.arn}/${record.eventName}/${record.eventTime}` +`source.service.name` | e.g. `mybucket` | S3 bucket name. | `record.s3.bucket.name` +`source.service.id` | e.g. `arn:aws:s3:::mybucket` | S3 bucket ARN. | `record.s3.bucket.arn` +`source.cloud.service.name` | `s3` | Fix value for S3. | - +`source.cloud.region` | e.g. `us-east-1` | S3 bucket region. | `record.awsRegion` +`source.service.version` | e.g. `2.1` | S3 schema version. | `record.s3.s3SchemaVersion` ## Data Flushing Lambda functions are immediately frozen as soon as the handler method ends. In case APM data is sent in an asyncronous way (as most of the agents do by default) data can get lost if not sent before the lambda function ends.