Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

ycombinator · 2020-06-11T00:11:58Z

What does this PR do?

This PR adds the cluster:admin/ilm/get and cluster:admin/ilm/put privilege to the remote_monitoring_agent built-in role.

Why is this change necessary?

The remote_monitoring_agent built-in role is intended for users who wish to monitor their Elastic Stack with Metricbeat. One of the checks that Metricbeat does upon start up is for the existence of an ILM policy. As such, this role should allow this check to happen.

Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster.

2020-06-10T17:10:37.707-0700    ERROR   [publisher_pipeline_output]     pipeline/output.go:154  Failed to connect to backoff(elasticsearch(http://localhost:9200)): Connection marked as failed because the onConnect callback failed: failed to check for policy name 'metricbeat': (status=403) {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"},"status":403}: 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"},"status":403}

…-in role

elasticmachine · 2020-06-11T00:11:59Z

Pinging @elastic/es-security (:Security/Authorization)

albertzaharovits · 2020-06-11T11:12:49Z

...core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java

@@ -73,6 +73,7 @@
                                "cluster:monitor/xpack/watcher/watch/get",
                                "cluster:admin/xpack/watcher/watch/put",
                                "cluster:admin/xpack/watcher/watch/delete",
+                                GetLifecycleAction.NAME,


It's preferable, if possible, to use privilege names, such as read_ilm in this case, instead of explicit action names. This offers us the maximum of flexibility, as we don't have to worry about BWC when changing action names.

albertzaharovits · 2020-06-11T11:17:31Z

Is it a specific ILM policy that Metricbeat requires access to (I think so, but want to make sure)?
In this case, in the future (see #50130), we should consider limiting access only to the specific policy, not to all of them.

ycombinator · 2020-06-11T12:08:45Z

Is it a specific ILM policy that Metricbeat requires access to (I think so, but want to make sure)?
In this case, in the future (see #50130), we should consider limiting access only to the specific policy, not to all of them.

Yes, it is a specific policy, governed by this setting: https://www.elastic.co/guide/en/beats/metricbeat/current/ilm.html#setup-ilm-policy_name-option. By default the name of the policy is metricbeat.

albertzaharovits

@ycombinator
LGTM, but I have two questions please:

can you clarify what/who PUTs the policy in the first place (and maintains/updates)? Is it the kibana_system user, or something that ships with ES from Add default composable templates for new indexing strategy #57629 .
is this breaking Metricbeat in 7.7 ?

lcawl · 2020-06-11T23:11:03Z

@ycombinator I think more than just read_ilm cluster privileges are required since when I tested this on 7.6.2, I subsequently got this message:

2020-06-11T15:49:30.389-0700 ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(http://localhost:9202)): Connection marked as failed because the onConnect callback failed: 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/put] is unauthorized for user [my-tester]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/put] is unauthorized for user [my-tester]"},"status":403}
2020-06-11T15:49:30.389-0700 INFO pipeline/output.go:93 Attempting to reconnect to backoff(elasticsearch(http://localhost:9202)) with 4 reconnect attempt(s)

I had to either give my user the manage_ilm cluster privilege or else put setup.ilm.enabled: false in my metricbeat.yml file to make those errors go away.

ycombinator · 2020-06-11T23:52:21Z

Thanks @albertzaharovits and @lcawl for your comments.

is this breaking Metricbeat in 7.7 ?

Yes, and in prior versions as well.

can you clarify what/who PUTs the policy in the first place (and maintains/updates)? Is it the kibana_system user, or something that ships with ES from #57629 .

I had to either give my user the manage_ilm cluster privilege or else put setup.ilm.enabled: false in my metricbeat.yml file to make those errors go away.

Users who use Metricbeat for Stack Monitoring can be divided into two classes:

those who are using Metricbeat just for Stack Monitoring. These users don't need ILM at all, because the Stack Monitoring modules in Metricbeat write to .monitoring-* indices, which don't use ILM today. So such users can use the remote_monitoring_user built-in user (which uses the remote_monitoring_agent role), set setup.ilm.enabled: false in their metricbeat.yml and everything should work. Metricbeat will simply skip the ILM policy check and setup as well.
those who are using Metricbeat for Stack Monitoring but also for monitoring other services. These users do need ILM because non-Stack-Monitoring modules in Metricbeat write to metricbeat-* indices, and Metricbeat manages the index template and ILM policy for these indices. If such users are using the remote_monitoring_user built-in user which is intended for Stack Monitoring, they will see errors about ILM setup because the role tied to that user, remote_monitoring_agent does not have ILM-related privileges. That's what this PR is trying to address.

So I think the options become:

Leave the built-in remote_monitoring_user and remote_monitoring_agent roles as-is. Don't give them additional privileges. Recommend to users in Stack Monitoring documentation that users can use the built-in user IF they are using Metricbeat just for Stack Monitoring. Otherwise, they must create a new user that uses the built-in remote_monitoring_agent role but also another role that grants the manage_ilm cluster privilege.
Add the manage_ilm cluster privilege to the remote_monitoring_agent built-in role, thereby granting it to the remote_monitoring_user built-in user as well.

Option 1 makes the "getting started" user experience a bit messier but option 2 makes the built-in remote_monitoring_agent role and remote_monitoring_user user less secure.

Any opinions on one over the other, @albertzaharovits or @lcawl?

albertzaharovits · 2020-06-12T13:56:38Z

Thanks for the explanation @ycombinator !

I think we should go with option 2

Add the manage_ilm cluster privilege to the remote_monitoring_agent built-in role, thereby granting it to the remote_monitoring_user built-in user as well.

My thinking is that the remote_monitoring_agent built-in role already has quite extensive privileges with manage_index_templates and manage_ingest_pipelines . The issue is that this role (and by extension the builtin remote_monitoring_user) can wreak havoc on all of ingestion in the cluster, it is not confined to monitoring ingestion. manage_ilm adds the added risk of data deletion because ILM has a delete index action. We don't have fine grained privileges for ILM , although we sorely need it, but the privs we currently have are a rough approximation that should only detract from the effective Security and not from the user experience.

But I don't think we should ship such a change in a patch release. From ES perspective this is an "enhancement" not a "bug". It's a bug from a use case perspective. Not even all ES bug fixes are ported to patch releases, if they change behaviour.

I don't feel strongly about it, but I feel it's not my decision to make. If you feel otherwise can you please rope in someone that's more involved in the release process, maybe a tech lead.

albertzaharovits · 2020-06-15T09:16:19Z

@ycombinator We discussed this inside the team this morning, and they've changed my mind.
We will backport this in a patch release (most likely 7.8.1 , but I'll backport to the 7.7 branch as well, just to be safe).

I feel discomfort whenever we have to grant such a liberal privilege for built-in roles, especially since it's required only during the initial setup, but this is simply our only option atm. I was reluctant to force this upon users upgrading the patch version, but @jkakavas convinced me that this sounds like a bug for users and we must find reasons NOT to backport bugs to patches, not the other way around.

albertzaharovits · 2020-06-15T10:04:44Z

@ycombinator I have taken the liberty to push to your PR branch in order to make up for the lost time with these discussions. I have modified the role to explicitly grant privs for PUT and GET ILM policy actions (instead of the manage_ilm); I went against my own recommendation to use privilege names because narrowing the privs in this way makes me feel just a tiny bit better.

albertzaharovits · 2020-06-15T10:11:41Z

@elasticmachine update branch

…57963) Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>

…lastic#57963) Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>

…57963) Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster. Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>

ycombinator · 2020-06-15T14:40:31Z

Sorry @albertzaharovits, got caught up in other things on Friday and then came the weekend. Thanks for making updates to this PR and merging it <3.

Add cluster:admin/ilm/get privilege for remote_monitoring_agent built…

cfe9a85

…-in role

ycombinator added >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.8.1 v7.9.0 v7.7.2 labels Jun 11, 2020

elasticmachine added the Team:Security Meta label for security team label Jun 11, 2020

albertzaharovits reviewed Jun 11, 2020

View reviewed changes

Use privilege name, not action name

552d190

albertzaharovits self-requested a review June 11, 2020 13:05

albertzaharovits approved these changes Jun 11, 2020

View reviewed changes

albertzaharovits self-assigned this Jun 11, 2020

ILM Put and Get for remote_monitoring_agent

7007701

albertzaharovits changed the title ~~Add cluster:admin/ilm/get privilege for remote_monitoring_agent built-in role~~ Add ILM policy PUT and GET for remote_monitoring_agent built-in role Jun 15, 2020

Merge branch 'master' into remote-monitoring-agent-privs-add-ilm-read

e1e1757

albertzaharovits merged commit 4e4b831 into elastic:master Jun 15, 2020

albertzaharovits mentioned this pull request Jun 15, 2020

BACKPORT Add ILM policy PUT and GET for remote_monitoring_agent built-in role #58101

Merged

albertzaharovits mentioned this pull request Jun 15, 2020

BACKPORT Add ILM policy PUT and GET for remote_monitoring_agent built-in role #58102

Merged

ycombinator deleted the remote-monitoring-agent-privs-add-ilm-read branch June 15, 2020 14:40

lcawl mentioned this pull request Jun 15, 2020

[DOCS] Add manage_ilm privilege to monitoring steps #58141

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

ycombinator commented Jun 11, 2020 •

edited by albertzaharovits

Loading

elasticmachine commented Jun 11, 2020

albertzaharovits Jun 11, 2020

albertzaharovits commented Jun 11, 2020

ycombinator commented Jun 11, 2020

albertzaharovits left a comment •

edited

Loading

lcawl commented Jun 11, 2020

ycombinator commented Jun 11, 2020 •

edited

Loading

albertzaharovits commented Jun 12, 2020

albertzaharovits commented Jun 15, 2020

albertzaharovits commented Jun 15, 2020

albertzaharovits commented Jun 15, 2020

ycombinator commented Jun 15, 2020

Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

Conversation

ycombinator commented Jun 11, 2020 • edited by albertzaharovits Loading

What does this PR do?

Why is this change necessary?

elasticmachine commented Jun 11, 2020

albertzaharovits Jun 11, 2020

Choose a reason for hiding this comment

albertzaharovits commented Jun 11, 2020

ycombinator commented Jun 11, 2020

albertzaharovits left a comment • edited Loading

Choose a reason for hiding this comment

lcawl commented Jun 11, 2020

ycombinator commented Jun 11, 2020 • edited Loading

albertzaharovits commented Jun 12, 2020

albertzaharovits commented Jun 15, 2020

albertzaharovits commented Jun 15, 2020

albertzaharovits commented Jun 15, 2020

ycombinator commented Jun 15, 2020

ycombinator commented Jun 11, 2020 •

edited by albertzaharovits

Loading

albertzaharovits left a comment •

edited

Loading

ycombinator commented Jun 11, 2020 •

edited

Loading