Skip to content

Commit

Permalink
Update PostgreSQL integration to support logs in CSV format (#747)
Browse files Browse the repository at this point in the history
Import latest changes in Beats. Including support for logs in CSV format.
* Import pipelines and test files from Beats.
* Update README.
* Add docker deployment and system tests.
* Add missing ECS fields to make tests pass.
  • Loading branch information
jsoriano authored Mar 22, 2021
1 parent e7b3146 commit ce930df
Show file tree
Hide file tree
Showing 111 changed files with 9,883 additions and 131 deletions.
31 changes: 28 additions & 3 deletions packages/postgresql/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,40 @@ This integration periodically fetches logs and metrics from [PostgreSQL](https:/

## Compatibility

The `log` dataset was tested with logs from versions 9.5 on Ubuntu, 9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.
The `log` dataset was tested with logs from versions 9.5 on Ubuntu, 9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3. CSV format was tested using versions 11 and 13 (distro is not relevant here).

The `activity`, `bgwriter`, `database` and `statement` datasets were tested with PostgreSQL 9.5.3 and is expected to work with all versions >= 9.

## Logs

### log

The `log` dataset collects the PostgreSQL logs.
The `log` dataset collects the PostgreSQL logs in plain text format or CSV.

#### Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in Fleet, just the filepath.

On the other hand, it's necessary to configure PostgreSQL to emit `.csv` logs.

The recommended parameters are:
```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

{{fields "log"}}

Expand Down Expand Up @@ -48,4 +73,4 @@ The `statement` dataset periodically fetches metrics from PostgreSQL servers.

{{event "statement"}}

{{fields "statement"}}
{{fields "statement"}}
4 changes: 4 additions & 0 deletions packages/postgresql/_dev/deploy/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ARG SERVICE_VERSION=${SERVICE_VERSION:-9.5.3}
FROM postgres:${SERVICE_VERSION}
COPY docker-entrypoint-initdb.d /docker-entrypoint-initdb.d
HEALTHCHECK --interval=10s --retries=6 CMD psql -h localhost -U postgres -l
11 changes: 11 additions & 0 deletions packages/postgresql/_dev/deploy/docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: '2.3'
services:
postgresql:
# Commented out `image:` below until we have a process to refresh the hosted images from
# Dockerfiles in this repo. Until then, we build the image locally using `build:` below.
# image: docker.elastic.co/integrations-ci/beats-postgresql:${POSTGRESQL_VERSION:-9.5.3}-1
build: .
ports:
- 5432
volumes:
- ${SERVICE_LOGS_DIR}/postgresql:/var/log/postgresql
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
chmod a+wx /var/log/postgresql

cat <<-EOF >> $PGDATA/postgresql.conf
# Enable some log facilities.
log_duration = 'on'
log_connections = 'on'
log_disconnections = 'on'
# Ensure that statements are logged, with their durations.
log_statement = 'none'
log_min_duration_statement = 0
# Give agent read permissions. In NO case for production usage.
log_file_mode = '0666'
# Try to imitate logging behaviour in Debian/Ubuntu, but there the logging collector
# is not used.
logging_collector = 'on'
log_directory = '/var/log/postgresql'
log_line_prefix = '%m [%p] %q%u@%d '
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash
cat <<-EOF >> $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
create extension pg_stat_statements;
4 changes: 4 additions & 0 deletions packages/postgresql/_dev/deploy/variants.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
variants:
v9_5_3:
SERVICE_VERSION: 9.5.3
default: v9_5_3
5 changes: 5 additions & 0 deletions packages/postgresql/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add support for logs in CSV format
type: enhancement # can be one of: enhancement, bugfix, breaking-change
link: https://github.com/elastic/integrations/pull/747
- version: "0.2.7"
changes:
- description: Updating package owner
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
64 changes: 64 additions & 0 deletions packages/postgresql/data_stream/activity/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,67 @@
- name: ecs
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.
When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: error
title: Error
group: 2
description: |-
These fields can represent errors of any kind.
Use them for errors that happen while fetching events or in cases where the
event itself contains an error.
type: group
fields:
- name: message
level: core
type: text
description: Error message.
- name: event
title: Event
group: 2
description: 'The event fields are used for context information about the log or metric event itself.
A log is defined as an event containing details of something that happened. Log events must include the time at which the thing happened. Examples of log events include a process starting on a host, a network packet being sent from a source to a destination, or a network connection between a client and a server being initiated or closed. A metric is defined as an event containing one or more numerical measurements and the time at which the measurement was taken. Examples of metric events include memory pressure measured on a host and device temperature. See the `event.kind` definition in this section for additional details about metric and state events.'
type: group
fields:
- name: dataset
level: core
type: keyword
ignore_above: 1024
description: 'Name of the dataset.
If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from.
It''s recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.'
example: apache.access
- name: duration
level: core
type: long
format: duration
input_format: nanoseconds
output_format: asMilliseconds
output_precision: 1
description: 'Duration of the event in nanoseconds.
If event.start and event.end are known this value should be the difference between the end and start time.'
- name: module
level: core
type: keyword
ignore_above: 1024
description: 'Name of the module this data is coming from.
If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.'
example: apache
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
64 changes: 64 additions & 0 deletions packages/postgresql/data_stream/bgwriter/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,67 @@
- name: ecs
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.
When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: error
title: Error
group: 2
description: |-
These fields can represent errors of any kind.
Use them for errors that happen while fetching events or in cases where the
event itself contains an error.
type: group
fields:
- name: message
level: core
type: text
description: Error message.
- name: event
title: Event
group: 2
description: 'The event fields are used for context information about the log or metric event itself.
A log is defined as an event containing details of something that happened. Log events must include the time at which the thing happened. Examples of log events include a process starting on a host, a network packet being sent from a source to a destination, or a network connection between a client and a server being initiated or closed. A metric is defined as an event containing one or more numerical measurements and the time at which the measurement was taken. Examples of metric events include memory pressure measured on a host and device temperature. See the `event.kind` definition in this section for additional details about metric and state events.'
type: group
fields:
- name: dataset
level: core
type: keyword
ignore_above: 1024
description: 'Name of the dataset.
If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from.
It''s recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.'
example: apache.access
- name: duration
level: core
type: long
format: duration
input_format: nanoseconds
output_format: asMilliseconds
output_precision: 1
description: 'Duration of the event in nanoseconds.
If event.start and event.end are known this value should be the difference between the end and start time.'
- name: module
level: core
type: keyword
ignore_above: 1024
description: 'Name of the module this data is coming from.
If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.'
example: apache
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
28 changes: 28 additions & 0 deletions packages/postgresql/data_stream/database/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
- name: ecs
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.
When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: error
title: Error
group: 2
description: |-
These fields can represent errors of any kind.
Use them for errors that happen while fetching events or in cases where the
event itself contains an error.
type: group
fields:
- name: message
level: core
type: text
description: Error message.
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
2020-04-15 12:02:55.244 CEST [23922] LOG: database system was shut down at 2020-04-15 12:02:52 CEST
2020-04-15 12:02:55.247 CEST [23920] LOG: database system is ready to accept connections
2020-04-15 12:04:45.416 CEST [24981] FATAL: password authentication failed for user "root"
2020-04-15 12:04:45.416 CEST [24981] DETAIL: Role "root" does not exist.
Connection matched pg_hba.conf line 80: "local all all md5"
2020-04-15 12:04:45.416 CEST [24981] LOG: could not send data to client: Broken pipe
2020-04-15 12:06:36.719 CEST [25143] ERROR: syntax error at or near "l" at character 1
2020-04-15 12:56:29.569 CEST [25143] STATEMENT: SELECT al.id, al.tenant_id, al.created_by_id, al.create_ip, al.audit_date, al.audit_table, al.entity_id, al.entity_name, al.reason_for_change, al.audit_log_event_type_id,
aet.lookup_code, al.old_value, al.new_value, al.event_crf_id, al.event_crf_version_id, al.study_id, al.study_site_id, ss.rc_oid, al.subject_id, s.unique_identifier,
al.study_event_id, sed.name AS studyEventName, al.user_id, al.value_index, al.crf_version_id, al.global_logs, cv.version_name, crf.id AS crfId, crf.name AS crfName
FROM public.rc_audit_log_events AS al
LEFT JOIN rc_crf_versions AS cv ON cv.id=al.crf_version_id
LEFT JOIN rc_crfs AS crf ON crf.id=cv.crf_id
LEFT JOIN ad_lookup_codes AS aet ON aet.id=al.audit_log_event_type_id
LEFT JOIN rc_study_sites AS ss ON ss.id=al.study_site_id
LEFT JOIN rc_subjects AS s ON s.id=al.subject_id
LEFT JOIN rc_study_events AS se ON se.id=al.study_event_id
LEFT JOIN rc_study_event_definitions AS sed ON sed.id=se.study_event_definition_id
WHERE al.tenant_id=$1 AND al.study_id=$2 AND aet.lookup_code IN ($3, $4, $5, $6) AND al.audit_date >= $7 ORDER BY al.id DESC limit $8
;
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
dynamic_fields:
event.ingested: ".*"
multiline:
first_line_pattern: '^\d{4}-\d{2}-\d{2} '
Loading

0 comments on commit ce930df

Please sign in to comment.