Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat PostgreSQL] Query Duration Overview dashboard has spurious query #20321

Closed
EthanStrider opened this issue Jul 29, 2020 · 6 comments · Fixed by #24607
Closed

[Filebeat PostgreSQL] Query Duration Overview dashboard has spurious query #20321

EthanStrider opened this issue Jul 29, 2020 · 6 comments · Fixed by #24607
Assignees
Labels
bug :Dashboards Filebeat Filebeat Team:Integrations Label for the Integrations team

Comments

@EthanStrider
Copy link

The [Filebeat PostgreSQL] Query Duration Overview seems to have a spurious query saved: postgresql.log.query:*
This makes it so that no results show up in the dashboard. Removing the query fixes the dashboard.

This looks like the offending line in the code:

Screen Shot 2020-07-23 at 7 51 33 PM

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 29, 2020
@EthanStrider
Copy link
Author

Created a PR here: #20322

@EthanStrider
Copy link
Author

@EthanStrider EthanStrider added the Team:Integrations Label for the Integrations team label Jul 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@jsoriano
Copy link
Member

Hey @EthanStrider, thanks for reporting, and sorry for the late reply.

I have found this problem while working on elastic/integrations#747, and I think that this is related to how postgresql logs statements.

It has two ways of doing it, that can be configured with the following options:

  • log_statement = 'all'
  • log_min_duration_statement = 0

When statements are logged because of log_statement, the statement itself and the duration or other related data are logged as different log lines that cannot be easily correlated, rendering these lines not very useful when many statements are being logged.

For example in the following case, the second duration corresponds to the first query, you need to check the pid to know what is the duration of each query, something that cannot be handled at the moment by Filebeat. Filebeat collects these lines as normal logs, without extracting any info, and thus postgresql.log.query is empty for these events.

2021-03-11 11:57:15.611 UTC [1953] postgres@postgres LOG:  statement: SELECT * FROM pg_stat_activity
2021-03-11 11:57:15.612 UTC [1951] postgres@postgres LOG:  statement: SELECT * FROM pg_stat_statements
2021-03-11 11:57:15.612 UTC [1951] postgres@postgres LOG:  duration: 0.349 ms
2021-03-11 11:57:15.612 UTC [1953] postgres@postgres LOG:  duration: 0.738 ms

On the other hand, when statements are logged because of log_min_duration_statement, the same log line includes the statement, the duration and sometimes other data related to the query. This is something Filebeat can handle. For example this log line includes all the info, and it works well with current pipelines and dashboard:

2021-03-11 12:24:50.833 UTC [730] postgres@postgres LOG:  duration: 0.662 ms  statement: SELECT d.datname as "Name",
	       pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
	       pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
	       d.datcollate as "Collate",
	       d.datctype as "Ctype",
	       pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
	FROM pg_catalog.pg_database d
	ORDER BY 1;

So I think there are two things that should be done to improve this situation:

  • First and more important, improve documentation, so users can know what to expect depending on the logging options they are using: [Filebeat] Improve documentation for Filebeat PostgreSQL module #14337
  • Consider parsing lines containing only statements, so at least postgresql.log.query is filled when log_statement is used, even if it is without its duration.

I don't think that removing the filter in this dashboard will solve this, because then the dashboard would be using events that don't contain information about queries.

@jsoriano
Copy link
Member

I don't think that removing the filter in this dashboard will solve this, because then the dashboard would be using events that don't contain information about queries.

Well, it seems that with log_duration = 'on' the duration of all queries is logged, but without the statement. In these cases the dashboard could still be useful and postgresql.log.query would be empty.

@jsoriano
Copy link
Member

Improving docs in #24588.
Improving pipeline and dashboards in #24607.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug :Dashboards Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
3 participants