AWS Lambda instrumentation

This spec specifies the instrumentation of applications / services running on AWS Lambda.

An AWS Lambda application needs to implement a handler method that is called whenever that Lambda function is invoked. A handler method receives (at least) two objects with useful information:

event: Depending on the trigger type of the Lambda function this object contains different, trigger-specific data. For some trigger types this object may be empty/null.
context: This object provides generic (trigger agnostic) meta information about the Lambda function.

In our instrumentation we use these objects to derive useful meta and context information.

Generic Lambda Instrumentation

In general, to instrument Lambda functions, we create transactions that wrap the execution of that handler method. In cases where we cannot assume any type or information in the event object (e.g. if the trigger type is undefined), we ignore trigger-specific information and simply wrap the handler method with a transaction, while using the context object to derive some necessary fields.

Field	Value	Description	Source
`service.name`	e.g. `MyFunctionName`	The name of the Lambda function.	`AWS_LAMBDA_FUNCTION_NAME` or `context.functionName`
`service.framework.name`	`AWS Lambda`	Constant value for the framework name.	-
`service.runtime.name`	e.g. `AWS_Lambda_java8`	The lambda runtime.	`AWS_EXECUTION_ENV`
`service.id`	e.g. `arn:aws:lambda:us-west-2:123456789012:function:my-function`	The ARN of the function without alias suffix.	`context.invokedFunctionArn`, remove the 8th ARN segment if the ARN contains an alias suffix. `arn:aws:lambda:us-west-2:123456789012:function:my-function:someAlias` will become `arn:aws:lambda:us-west-2:123456789012:function:my-function`.
`service.version`	e.g. `${LATEST}`	The lambda function version	`AWS_LAMBDA_FUNCTION_VERSION` or `context.functionVersion`
`service.node.name`	e.g. `2019/06/07/[$LATEST]e6f...`	The log stream name uniquely identifying a function instance.	`AWS_LAMBDA_LOG_STREAM_NAME` or `context.logStreamName`
`cloud.provider`	`aws`	Constant value for the cloud provider.
`cloud.origin.provider`	`aws`	Constant value for the origin cloud provider.
`cloud.region`	e.g. `us-east-1`	The cloud region.	`AWS_REGION`
`cloud.service.name`	`lambda`	Constant value for the AWS service.
`transaction.type`	`request`	-	-
`faas.trigger.type`	`other`	The trigger type. Use `other` if trigger type is unknown / cannot be specified.	More concrete triggers are `http`, `pubsub`, `datasource`, `timer` (see specific triggers below).
`faas.execution`	`af9aa4-a6...`	The AWS request ID of the function invocation	`context.awsRequestId`
`faas.coldstart`	`true` / `false`	Boolean value indicating whether a Lambda function invocation was a cold start or not.	see section below
`transaction.name`	e.g. `MyFunctionName`	Use function name if trigger type is `other`.	`context.functionName`
`faas.trigger.request_id`	-	Do not set this field if trigger type is `other`.	Trigger specific.
`service.origin.*`	-	Do not set these fields if trigger type is `other`.	Trigger specific.
`cloud.origin.*`	-	Do not set these fields if trigger type is `other`.	Trigger specific.

Overwriting Metadata

Automatically capturing cloud metadata doesn't work reliably from a Lambda environment. Moreover, retrieving cloud metadata through an additional HTTP request may slowdown the lambda function / increase cold start behaviour. Therefore, the generic cloud metadata fetching should be disabled when the agent is running in a lambda context (for instance through checking for the existance of the AWS_LAMBDA_FUNCTION_NAME environment variable). Where possible, metadata should be overwritten at Lambda runtime startup corresponding to the field specifications in this spec. Some metadata fields are not available at startup (e.g. invokedFunctionArn). In this case corresponding fields need to be set either as metadata with the first invocation of the lambda function or set for every lambda invocation as transaction fields.

Deriving cold starts

A cold start occurs if AWS needs first to initialize the Lambda runtime (including the Lambda process, such as JVM, Node.js process, etc.) in order to handle a request. This happens for the first request and after long function idle times. A Lambda function instance only executes one event at a time (there is no concurrency). Thus, detecting a cold start is as simple as detecting whether the invocation of a handler method is the first since process startup or not. This can be achieved with a global / process-scoped flag that is flipped at the first execution of the handler method.

Trigger-specific Instrumentation

Lambda functions can be triggered in many different ways. A generic transaction for a Lambda invocation can be created independently of the actual trigger. However, depending on the trigger type, different information might be available that can be used to capture additional transaction data or that allows additional, valuable spans to be derived. The most common triggers that we want dedicated instrumentation support for are the following:

API Gateway V1
API Gateway V2
SQS
SNS
S3

If none of the above apply, the fallback should be a generic instrumentation (as described above) that can deal with any type of trigger (thus capturing only the minimal available information).

API Gateway (V1 & V2)

There are two different API Gateway versions (V1 & V2) that differ slightly in the information (event object) that is passed to the Lambda handler function.

With both versions, the event object contains information about the http request. Usually API Gateway-based Lambda functions return an object that contains the HTTP response information. The agent should use the information in the request and response objects to fill the HTTP context (context.request and context.response) fields in the same way it is done for HTTP transactions.

In particular, agents must use the event.headers to retrieve the traceparent and the tracestate and use them to start the transaction for the lambda function execution.

In addition the following fields should be set for API Gateway-based Lambda functions:

Field	Value	Description	Source
`faas.trigger.type`	`http`	Constant value for API gateway.	-
`transaction.name`	e.g. `GET MyFunction`	Http method followed by a whitespace and the function name.	-
`faas.trigger.request_id`	e.g. `afa4-a6...`	ID of the API gateway request.	`event.requestContext.requestId`
`service.origin.name`	e.g. `POST /{proxy+}/Prod`	Readable API gateway endpoint.	Format: `${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage}`
`service.origin.id`	e.g. `gy415nu...`	`event.requestContext.apiId`
`service.origin.version`	e.g. `1.0`	`1.0` for API Gateway V1, `2.0` for API Gateway V2.	-
`cloud.origin.service.name`	`api gateway`	Fix value for API gateway.	-
`cloud.origin.account.id`	e.g. `12345678912`	Account ID of the API gateway.	`event.requestContext.accountId`

SQS / SNS

Lambda functions that are triggered by SQS (or SNS) accept an event input that may contain one or more SQS / SNS messages in the event.records array. All message-related context information (including the traceparent) is encoded in the individual message attributes (if at all). We cannot (automatically) wrap the processing of the individual messages that are sent as a batch of messages with a single event.

Thus, in case that an SQS / SNS event contains exactly one SQS / SNS message, the agents must apply the following, messaging-specific retrieval of information. Otherwise, the agents should apply the Generic Lambda Instrumentation as described above.

With only one message in event.records, the agents can use the single SQS / SNS record to retrieve the traceparent and tracestate from record.messageAttributes and use it for starting the lambda transaction.

In addition the following fields should be set for Lambda functions triggered by SQS or SNS:

SQS

Field	Value	Description	Source
`faas.trigger.type`	`pubsub`	Constant value for message based triggers	-
`transaction.name`	e.g. `RECEIVE SomeQueue`	Follow the messaging spec for transaction naming.	Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
`faas.trigger.reuqest_id`	e.g. `someMessageId`	SQS message ID.	`record.messageId`
`service.origin.name`	e.g. `my-queue`	SQS queue name	Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
`service.origin.id`	e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue`	SQS queue ARN.	`record.eventSourceArn`
`cloud.origin.service.name`	`sqs`	Fix value for SQS.	-
`cloud.origin.region`	e.g. `us-east-1`	SQS queue region.	`record.awsRegion`
`cloud.origin.account.id`	e.g. `12345678912`	Account ID of the SQS queue.	Parse account segment (5th) from `record.eventSourceArn`.
`message.queue`	e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue`	SQS queue ARN.	`record.eventSourceArn`
`message.age`	e.g. `3298`	Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available.	Message attribute with key `SentTimestamp`.
`message.body`	-	The message body. Should only be captured if body capturing is enabled in the configuration.	`record.body`
`message.headers`	-	The message attributes. Should only be captured, if capturing headers is enabled in the configuration.	`record.messageAttributes`

SNS

Field	Value	Description	Source
`faas.trigger.type`	`pubsub`	Constant value for message based triggers	-
`transaction.name`	e.g. `RECEIVE SomeTopic`	Follow the messaging spec for transaction naming.	Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
`faas.trigger.reuqest_id`	e.g. `someMessageId`	SNS message ID.	`record.sns.messageId`
`service.origin.name`	e.g. `my-topic`	SNS topic name	Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
`service.origin.id`	e.g. `arn:aws:sns:us-east-2:123456789012:my-topic`	SNS topic ARN.	`record.sns.topicArn`
`service.origin.version`	e.g. `2.1`	SNS event version	`record.eventVersion`
`cloud.origin.service.name`	`sns`	Fix value for SNS.	-
`cloud.origin.region`	e.g. `us-east-1`	SNS topic region.	Parse region segment (4th) from `record.sns.topicArn`.
`cloud.origin.account.id`	e.g. `12345678912`	Account ID of the SNS topic.	Parse account segment (5th) from `record.sns.topicArn`.
`message.queue`	e.g. `arn:aws:sns:us-east-2:123456789012:my-topic`	SNS topic ARN.	`record.sns.topicArn`
`message.age`	e.g. `3298`	Age of the message in milliseconds. `current_time` - `snsTimestamp`.	`record.sns.timestamp`
`message.body`	-	The message body. Should only be captured if body capturing is enabled in the configuration.	`record.sns.message`
`message.headers`	-	The message attributes. Should only be captured, if capturing headers is enabled in the configuration.	`record.sns.messageAttributes`

S3

Lambda functions that are triggered by S3 accept an event input that may contain one ore more S3 event notification records in the event.records array. We cannot (automatically) wrap the processing of the individual records that are sent as a batch of S3 event notification records with a single event.

Thus, in case that an S3 event contains exactly one S3 event notification record, the agents must apply the following, S3-specific retrieval of information. Otherwise, the agents should apply the Generic Lambda Instrumentation as desribed above.

In addition the following fields should be set for Lambda functions triggered by S3:

Field	Value	Description	Source
`faas.trigger.type`	`datasource`	Constant value.	-
`transaction.name`	e.g. `ObjectCreated:Put mybucket`	Use event name and bucket name.	`${record.eventName} ${record.s3.bucket.name}`
`faas.trigger.reuqest_id`	e.g. `C3D13FE58DE4C810`	S3 event request ID.	`record.responseElements.xAmzRequestId`
`service.origin.name`	e.g. `mybucket`	S3 bucket name.	`record.s3.bucket.name`
`service.origin.id`	e.g. `arn:aws:s3:::mybucket`	S3 bucket ARN.	`record.s3.bucket.arn`
`cloud.origin.service.name`	`s3`	Fix value for S3.	-
`cloud.origin.region`	e.g. `us-east-1`	S3 bucket region.	`record.awsRegion`
`service.origin.version`	e.g. `2.1`	S3 event version.	`record.eventVersion`

Data Flushing

Lambda functions are immediately frozen as soon as the handler method ends. In case APM data is sent in an asyncronous way (as most of the agents do by default) data can get lost if not sent before the lambda function ends.

Therefore, the Lambda instrumentation has to ensure that data is flushed in a blocking way before the execution of the handler function ends. Where possible, agents may optimize the flushing behaviour by avoiding a dedicated HTTP request for each Lambda invocation but instead flushing the buffer on the HTTP connection:

Waits until all pending events have been processed
Performs a synced_flush on the gzip buffer and flushes all buffers to the network
Keeps the HTTP request alive
Returns immediately if the connection to APM Server is unhealthy (when there's a backoff)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing-instrumentation-aws-lambda.md

tracing-instrumentation-aws-lambda.md

AWS Lambda instrumentation

Generic Lambda Instrumentation

Overwriting Metadata

Deriving cold starts

Trigger-specific Instrumentation

API Gateway (V1 & V2)

SQS / SNS

SQS

SNS

S3

Data Flushing

Files

tracing-instrumentation-aws-lambda.md

Latest commit

History

tracing-instrumentation-aws-lambda.md

File metadata and controls

AWS Lambda instrumentation

Generic Lambda Instrumentation

Overwriting Metadata

Deriving cold starts

Trigger-specific Instrumentation

API Gateway (V1 & V2)

SQS / SNS

SQS

SNS

S3

Data Flushing