Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Faas fields - stage 1 #1542

Merged
merged 8 commits into from
Aug 19, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 56 additions & 12 deletions rfcs/text/0027-faas-fields.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,26 @@
# 0027: Function as a Service Fields
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. -->

- Stage: **0 (strawperson)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html -->
- Stage: **1 (draft)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html -->
- Date: **2021-07-22** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. -->

<!--
As you work on your RFC, use the "Stage N" comments to guide you in what you should focus on, for the stage you're targeting.
Feel free to remove these comments as you go along.
-->

Using APM agents in the context of serverless environments (e.g. AWS Lambda, Azu Functions, etc.) allows to capture function as a service (faas) specific context that can be of great value for the end users and provide correlation points with other sources of data.
<!--
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences.
-->

Using APM agents in the context of serverless environments (e.g. AWS Lambda, Azure Functions, etc.) allows to capture function as a service (faas) specific context that can be of great value for the end users and provide correlation points with other sources of data.

Extending ECS with a dedicated fields group or embedding it into exsting `cloud` fields would allow to capture this data in a meaningful, semantically aligned way and correlate the data accross different use cases (e.g. correlating AWS Lambda traces with corresponding Lambda metrics and logs).

The existing specification in OpenTelemetry can serve as a good orientation: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md#example

I'm proposing to add the following kind of fields:
<!--
Initial proposal:

Field | Example | Description
-- | -- | --
Expand All @@ -33,9 +38,6 @@ faas.name | "my-lambda-function" | the name of the function
faas.id | "arn:aws:lambda:us-west-2:123456789012:function:my-lambda-function" | The ID of the function
faas.version | "semver:2.0.0" | The version of the function
faas.instance | "my-lambda-function:instance-0001" | The instance of the function

<!--
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences.
-->

<!--
Expand All @@ -52,6 +54,35 @@ Stage X: Provide a brief explanation of why the proposal is being marked as aban
Stage 1: Describe at a high level how this change affects fields. Include new or updated yml field definitions for all of the essential fields in this draft. While not exhaustive, the fields documented here should be comprehensive enough to deeply evaluate the technical considerations of this change. The goal here is to validate the technical details for all essential fields and to provide a basis for adding experimental field definitions to the schema. Use GitHub code blocks with yml syntax formatting, and add them to the corresponding RFC folder.
-->

Discussing the initial proposal with Andrew Wilkins, we came up with an adapted proposal (compared to the proposal for stage 0) that would reuse as many as possible existing ECS fields:

### New Fields
Field | Type | Example | Description | Use case
-- | -- | -- | -- |--
faas.coldstart | boolean | true | Boolean value indicating a cold start of a function | Can be used in the UI denote function coldstarts.
faas.execution | keyword | "af9d5aa4-a685-4c5f-a22b-444f80b3cc28" | The execution ID of the current function execution. | Allows correlation with CloudWatch logs and metrics
faas.trigger.type | keyword | "http" | one of `http`,`pubsub`,`datasource`, `timer`, `other` | Allows differentiating different function types
faas.trigger.request_id | keyword | e.g. `123456789` | The iD of the trigger request , message, event, etc. | Correlation of metrics and logs with the corresponding trigger request

### Reusing existing `service.*` fields
For the initially proposed fields `faas.name`, `faas.id`, `faas.version` and `faas.instance` we decided to reuse the existing fields `service.name`, `service.id`, `service.version` and `service.node.name`.

### Nesting `cloud.*` and `service.*` fields under `_.origin.*` and `_.target.*`
We identified a big overlap between the initially proposed `faas.trigger.*` fields with the already existing `cloud.*` and `service.` fields.
Allowing to **self-nest cloud and service fields** under `cloud.origin.*` / `cloud.target.*` and `service.origin.*` / `service.target.*`, respectively, would allow to cover most of the `faas.trigger.*` fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose instead of nesting at _.origin.*, the existing top-level field set represents the origin. The target would be nested at _.target.* as proposed.

This approach continues the pattern adopted for the user field set: user.* and user.target.*. The user/service/entity performing the main action is captured under the top-level with the user/service/entity affected by the action residing at _.target.*.

Example :

{
  "service": {
    "name": "origin-service",
    "target": {
      "name": "target-service"
    }
  }
}

Using the existing top-level fields for the event "doer" lets users continue using existing queries. For example, only querying service.name instead of needing to account for both service.name and service.origin.name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin is in addition to service.name, it gives more context as to where the function was invoked from.

CALLEE -> FUNCTION INVOCATION -> OUTGOING CALL

the proposal allows us to set information for all three stages:

service.origin.name -> service.name -> service.target.name

With service.* always being present and relating the the do'er or FUNCTION INVOCATION itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Mpdreamz, for the follow-up. I also reviewed the trigger examples from elastic/apm#470, and those examples also helped me to understand these FaaS use cases better.

I'm still hesitant that introducing the _.origin.* nestings could create confusion. For example, when a user should select service.name or cloud.service.name versus service.origin.name or cloud.origin.service.name. However, if we define the difference clearly in the ECS docs, we can hopefully avoid confusion between the two.

I have no objection to moving forward with the _.origin.* nestings as proposed. However, can we please summarize what we discussed here as a potential concern in the Concerns section?


Moreover, the proposal for nesting cloud fields would resolve other use cases as well (e.g. https://github.com/elastic/ecs/issues/1282).

Initially proposed | New proposed nested cloud or service field
-- | --
faas.trigger.name | `service.origin.name`
faas.trigger.id | `service.origin.id`
faas.trigger.version | `service.origin.version`
faas.trigger.account.name | `cloud.origin.account.name`
faas.trigger.account.id | `cloud.origin.account.id`
faas.trigger.region | `cloud.origin.region`


<!--
Stage 2: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting, and add them to the corresponding RFC folder.
-->
Expand All @@ -62,11 +93,25 @@ Stage 2: Add or update all remaining field definitions. The list should now be e
Stage 1: Describe at a high-level how these field changes will be used in practice. Real world examples are encouraged. The goal here is to understand how people would leverage these fields to gain insights or solve problems. ~1-3 paragraphs.
-->

### `faas.coldstart`
Will be used in the APM UI to mark function invocations that resultet from a coldstart. This is a useful information for the end users to differentiate coldstart behaviour from warmstart function invocations.

### `faas.execution` & `faas.trigger.request_id`
These IDs will be used to correlate APM data (traces / transactions), logs and metrics of the faas function (e.g. from CloudWatch) as well as logs and metrics from the corresponding trigger for individual invocations.

### `faas.trigger.type`
Indicates the type of the function trigger. Allows to group different function types.

### `service.origin.*` & `cloud.origin.*`
Provides meta information on the origin service that triggered the faas function. End users can use this information to better understand the context, dependencies and causalities when analyzing and troubleshooting faas-related observability scenarios.
For example, this information could provide insights on analysis questions like this: "Do function invocations that are triggered from cloud region us-east-1 behave similar to invocations from region eu-west-1?", etc.

## Source data

<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->
Faas functions provide meta-information in their execution environment. APM agents use instrumentation techniques to read this information. For instance, AWS Lambda provides an `event` and a `context` object with each function invocation: https://docs.aws.amazon.com/lambda/latest/dg/python-context.html

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting, or if on the larger side, add them to the corresponding RFC folder.
Expand Down Expand Up @@ -104,7 +149,9 @@ Stage 3: Document resolutions for all existing concerns. Any new concerns should

The following are the people that consulted on the contents of this RFC.

* @AlexanderWert | author
* @AlexanderWert | author, sponsor
* @axw | subject matter expert
* @Mpdreamz | subject matter expert

<!--
Who will be or has been consulted on the contents of this RFC? Identify authorship and sponsorship, and optionally identify the nature of involvement of others. Link to GitHub aliases where possible. This list will likely change or grow stage after stage.
Expand All @@ -122,14 +169,11 @@ e.g.:
## References

<!-- Insert any links appropriate to this RFC in this section. -->
* [OpenTelemetry Faas Specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md#example)

### RFC Pull Requests
ebeahan marked this conversation as resolved.
Show resolved Hide resolved

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 0: https://github.com/elastic/ecs/pull/1518

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
* Stage 1: https://github.com/elastic/ecs/pull/1542
11 changes: 11 additions & 0 deletions rfcs/text/0027/cloud.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
- name: cloud
reusable:
top_level: true
expected:
- at: cloud
as: target
short_override: Cloud information about the invocation target.
- at: cloud
as: origin
short_override: Cloud information about the invocation origin.
69 changes: 69 additions & 0 deletions rfcs/text/0027/faas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
- name: faas
title: Function as a Service
group: 2
short: Fields for function as a service executions.
description: >
Fields related to serverless execution contexts and invocations of
function as a service resources such as AWS Lambda, Azure functions,
Google Cloud Functions, etc.
type: group
fields:
- name: execution
level: extended
example: af9d5aa4-a685-4c5f-a22b-444f80b3cc28
type: keyword
short: The execution ID.
description: >
Uniquely identifies an invocation of a serverless function.
- name: trigger.type
level: extended
type: keyword
short: The type of the trigger a function invoction is resulting from.
description: >
Serverless functions can be triggered through different types of upstream services,
such as API gateways, message queues, change events on storage files, etc.
This field specifies the type of the trigger.
example: http
allowed_values:
- name: http
description: >
This value indicates a function invocation triggered through an HTTP request.
For example, on AWS, `trigger.type` is set to the value `http` if an API Gateway
triggers a Lambda function.
- name: pubsub
description: >
This value indicates a function invocation triggered through a message being received.
For example, on AWS, `trigger.type` is set to the value `pubsub` if a Lambda function
is triggered by an SQS or an SNS message.
- name: datasource
description: >
This value indicates a function invocation triggered by an event that results from a
change on a datasource.
For example, on AWS, `trigger.type` is set to the value `datasource` if a Lambda function
is triggered by a change on a S3 bucket or file.
- name: timer
description: >
This value indicates a scheduled function invocation.
For example, on AWS, `trigger.type` is set to the value `timer` if a Lambda function
is triggered by a scheduled CloudWatch event.
- name: other
description: >
This value is used if a function invocation does not fit into any of the explicit
`trigger.type` categories.
- name: trigger.request_id
level: extended
example: zf7d5cb3-a685-4c5f-a22b-745f80b3dx49
type: keyword
description: >
The unique request ID of the trigger event for a function invocation.
- name: coldstart
level: extended
type: boolean
example: true
short: Indicates a cold start of a function.
description: >
Boolean value indicating a cold start of a function invocation.
A function invocation leads to a cold start if the serverless
runtime needs to be created and started before the actual request can be handled.
Requests that hit active serverless runtimes do not suffer from a cold start.
11 changes: 11 additions & 0 deletions rfcs/text/0027/service.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
- name: service
reusable:
top_level: true
expected:
- at: service
as: target
short_override: Target service of an invocation.
- at: service
as: origin
short_override: Origin service of an invocation.