Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date Processor doc update #6381

52 changes: 41 additions & 11 deletions _data-prepper/pipelines/configuration/processors/date.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,30 @@
# date


The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp.
The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp.

## Configuration

The following table describes the options you can use to configure the `date` processor.

Option | Required | Type | Description
:--- | :--- | :--- | :---
match | Conditionally | List | List of `key` and `patterns` where patterns is a list. The list of match can have exactly one `key` and `patterns`. There is no default value. This option cannot be defined at the same time as `from_time_received`. Include multiple date processors in your pipeline if both options should be used.
from_time_received | Conditionally | Boolean | A boolean that is used for adding default timestamp to event data from event metadata which is the time when source receives the event. Default value is `false`. This option cannot be defined at the same time as `match`. Include multiple date processors in your pipeline if both options should be used.
destination | No | String | Field to store the timestamp parsed by date processor. It can be used with both `match` and `from_time_received`. Default value is `@timestamp`.
source_timezone | No | String | Time zone used to parse dates. It is used in case the zone or offset cannot be extracted from the value. If the zone or offset are part of the value, then timezone is ignored. Find all the available timezones [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List) in the **TZ database name** column.
destination_timezone | No | String | Timezone used for storing timestamp in `destination` field. The available timezone values are the same as `source_timestamp`.
locale | No | String | Locale is used for parsing dates. It's commonly used for parsing month names(`MMM`). It can have language, country and variant fields using IETF BCP 47 or String representation of [Locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object. For example `en-US` for IETF BCP 47 and `en_US` for string representation of Locale. Full list of locale fields which includes language, country and variant can be found [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default value is `Locale.ROOT`.
`match` | Conditionally | [Match](#Match) | The date match configuration. There is no default value. This option cannot be defined at the same time as `from_time_received`. There is no default value
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`from_time_received` | Conditionally | Boolean | When `true`, the timestamp from the event metadata which is the time when source receives the event is added to event data. This option cannot be defined at the same time as `match`. Default is `false`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`date_when` | No | String | Specifies under what condition the `date` processor should perform matching. Default is no condition.
`to_origination_metadata` | No | Boolean | When `true` matched time is also added to the event's metadata as an instance of `Instant`. Defaults to `false`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the" precede "matched time"?

`destination` | No | String | The field used to store the timestamp parsed by date processor. It can be used with both `match` and `from_time_received`. Default is `@timestamp`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`output_format` | No | String | Determines the format of timestamp added to event. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`.

Check warning on line 25 in _data-prepper/pipelines/configuration/processors/date.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.AcronymParentheses] 'SSSXXX': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone. Raw Output: {"message": "[OpenSearch.AcronymParentheses] 'SSSXXX': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.", "location": {"path": "_data-prepper/pipelines/configuration/processors/date.md", "range": {"start": {"line": 25, "column": 118}}}, "severity": "WARNING"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm whether "the" or "a" should precede "timestamp".

`source_timezone` | No | String | The time zone used to parse dates, including when the zone or offset cannot be extracted from the value. If the zone or offset are part of the value, then time zone is ignored. You can find a list of all the available time zones by going to [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List) in the **TZ database name** column.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`destination_timezone` | No | String | The time zone used for storing timestamp in `destination` field. The available time zone values are the same as `source_timestamp`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm whether my additions should be "the" or "a". The last sentence needs a bit of clarification because the time zone values aren't literally the same as source-timestamp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reused the sentence from source-timestamp.

`locale` | No | String | The location is used for parsing dates. It's commonly used for parsing month names(`MMM`). It can have language in the value, country and variant fields using IETF BCP 47 such as `en-US`, or a string representation of the [Locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object such as `en_US`. A full list of locale fields which includes language, country, and variant can be found in [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default is `Locale.ROOT`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third sentence: By "language", do we mean "text"? Instead of "using IETF BCP 47", "in IETF BCP 47 format"? In the penultimate sentence, confirm the list items (should they match the list in the third sentence?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be the language of locale, the en in en_US.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natebower: I adjusted your suggestion, let me know if it looks okay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


<!---## Configuration
### Match

Content will be added to this section.--->
Option | Required | Type | Description
:--- | :--- | :--- | :---
`key` | Yes | String | Represents the key in the event to match the patterns against. Required if `match` is configured.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Represents the event key against which to match patterns"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`patterns` | Yes | List | A list of possible patterns the timestamp value of the key can have. The patterns are based on a sequence of letters and symbols. The `patterns` support all the patterns listed in the Java [DatetimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) reference. It also supports `epoch_second`, `epoch_milli` and `epoch_nano` values which represents the timestamp as the number of seconds, milliseconds and nanoseconds since the epoch. Epoch values always use the UTC time zone.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the penultimate sentence, instead of "It", let's name the noun we're referencing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp value


## Metrics

Expand All @@ -40,5 +46,29 @@

The `date` processor includes the following custom metrics.

* `dateProcessingMatchSuccessCounter`: Returns the number of records that match with at least one pattern specified by the `match configuration` option.
* `dateProcessingMatchFailureCounter`: Returns the number of records that did not match any of the patterns specified by the `patterns match` configuration option.
* `dateProcessingMatchSuccessCounter`: Returns the number of records that match with at least one pattern specified by the `match configuration` option.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
* `dateProcessingMatchFailureCounter`: Returns the number of records that did not match any of the patterns specified by the `patterns match` configuration option.

## Example: Add default timestamp to event
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add articles.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
The following Date processor configuration can be used to add default timestamp in `@timestamp` filed to all events:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct capitalization and add articles.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```yaml
- date:
from_time_received: true
destination: "@timestamp"
```

## Example: Parse timestamp to convert format and time zone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add articles.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
The following Data processor configuration can be used to parse the value of timestamp filed in `dd/MMM/yyyy:HH:mm:ss` and write it in `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` format:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"date" instead of "Data"? Please correct capitalization, add articles, and revise for clarity.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```yaml
- date:
match:
- key: timestamp
patterns: ["dd/MMM/yyyy:HH:mm:ss"]
destination: "@timestamp"
output_format: "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
source_timezone: "America/Los_Angeles"
destination_timezone: "America/Chicago"
locale: "en_US"
```
Loading