Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ES API Keys for client operations #1067

Closed
gingerwizard opened this issue Oct 1, 2020 · 19 comments
Closed

Support ES API Keys for client operations #1067

gingerwizard opened this issue Oct 1, 2020 · 19 comments
Assignees
Labels
enhancement Improves the status quo

Comments

@gingerwizard
Copy link
Contributor

The Elastic agent uses API keys to communicate with ES. This is the recommended way for clients to communicate with ES. We should do this by default. For benchmark-only this is trivial. For cases where we orchestrate ES we will need to create this API key using the creds.

@danielmitterdorfer @dliappis

@gingerwizard gingerwizard added the enhancement Improves the status quo label Oct 1, 2020
@gingerwizard
Copy link
Contributor Author

This will be simplified greatly once we have universal API keys i.e. cloud API keys can be mapped to roles in ES clusters. This avoid us having to create anything.

@gingerwizard
Copy link
Contributor Author

I see two potential modes here:

  1. Every client uses the same API key passed as a parameter like basic auth
  2. Every client authenticates and uses its own api key - this is the more realistic scenario if replicating our agent.

@hub-cap hub-cap self-assigned this Oct 19, 2020
@hub-cap
Copy link
Contributor

hub-cap commented Oct 21, 2020

Do we have any idea for the permission set to be used for each key? If we do not define a fine grained permission set, the key will default to the same auth as the rally basic auth user. If we do not care about exercising the fine grained access control, it is definitely easiest to let every key default to the same permissions the http auth user has when creating it.

(Optional, array-of-role-descriptor) An array of role descriptors for this API key. This parameter is 
optional. When it is not specified or is an empty array, then the API key will have a point in time 
snapshot of permissions of the authenticated user. If you supply role descriptors then the resultant 
permissions would be an intersection of API keys permissions and authenticated user’s permissions 
thereby limiting the access scope for API keys. The structure of role descriptor is the same as the 
request for create role API. For more details, see create or update roles API. 

@gingerwizard
Copy link
Contributor Author

Given Rally is in most cases running in a privileged mode and should never be used against a production cluster, i'm personally ok with the token defaulting to the same permissions as the rally basic auth user. I can't see any cases where we would want to test restricted permissions unless we believe it has the potential to impact performance. Even if the latter does become true, this feels like an enhancement that can be addressed in later iterations.

hub-cap added a commit to hub-cap/rally that referenced this issue Oct 22, 2020
This commit adds a optional client arg, use_api_key, which will generate
and use an api key for every client and async client created by
Rally. It uses an initial call via the existing auth scheme to ES to
generate the key, and then creates a new client using the api_key for
general use by Rally.

The async client also does this, and temporarily creates a sync client
to generate the key, and then uses it when creating an async client.

Closes elastic#1067
@danielmitterdorfer
Copy link
Member

Please refer to these definitions for terms that I use below:

  • Client: A logical client simulated by Rally. The number of simulated clients is specified by the clients property on a task.
  • Worker (process): The Rally actor that simulates load. Rally creates one worker per CPU core. Each worker may simulate multiple clients (using asyncio).
  • Elasticsearch Python client: The Python library that Rally uses send requests to Elasticsearch.

I agree with #1067 (comment) that we need two modes but I think we should first implement the mode where each client uses its own API key as this more relevant for realistic benchmarks.

@hub-cap and me had a chat about this already and the implementation in #1098 covers a scenario where each worker process (potentially simulating multiple clients) uses its own API key, i.e. we neither use shared key for all clients nor does each client use its own key.

In order to implement this, we need to consider the following aspects:

  • Clients and Elasticsearch Python clients are created per task (see the lifecycle of AsyncIoAdapter which gets recreated for each task). As creating many (thousand) API keys is time-consuming, they should only be created once for the benchmark and reused across tasks. This means we need a (generic) facility to create API keys at the beginning of a benchmark if needed, i.e. when the user specifies the respective client option.
  • API keys need to be stored in a data structure that allows each client to lookup its API key. We should not tie this data structure specifically to API keys but rather introduce a more generic data structure like a "client context" which allows to store arbitrary information per client.
  • Clients should be able to pass the API key transparently to the Elasticsearch Python client. At the moment the Elasticsearch Python client assumes that the API key is provided in the contructor of the elasticsearch.Elasticsearch class. It is passed to the constructor of elasticsearch.connection.base.Connection where the API key parameter is converted into the HTTP header authorization. While it would be possible that we set the API key ourselves per request in Rally's AIOHttpConnection with something like the following sketch, we should clarify whether it is possible that the Elasticsearch Python client provides a per-request override for the API key:
    async def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=(), headers=None):
        final_headers = dict(headers)
        final_headers["authorization"] = self._get_api_key_header_val(api_key=retrieve_api_key_for_current_client())
        return await super().perform_request(method, url, params, body, timeout, ignore, final_headers)

@sethmlarson
Copy link
Contributor

Supporting per-request api_key and http_auth shouldn't be too hard as it's being done for the Enterprise Search client. I've created an issue here: elastic/elasticsearch-py#1421

@YohanSciubukgian
Copy link

Any update on this issue ? It could be great to use ApiKey authentication to use Rally against our ES cluster.

@DJRickyB
Copy link
Contributor

Hi @YohanSciubukgian ,

I asked around internally; sorry, there's hasn't been any work done here yet. Once this gets traction the ticket will be updated

Thanks

@sethmlarson
Copy link
Contributor

The upstream issue blocking this feature has been implemented in elastic/elasticsearch-py#1577 and will be available in 7.13.0, you can test against the 7.x branch when developing this feature.

@inqueue
Copy link
Member

inqueue commented Dec 16, 2021

API key authentication should also be supported for the ES-based metrics store.

@michaelbaamonde michaelbaamonde self-assigned this May 10, 2022
@michaelbaamonde
Copy link
Contributor

michaelbaamonde commented May 16, 2022

I'm working on this at the moment and wanted to separate some concerns and clarify scope. As initially proposed and later reiterated, the primary motivation is supporting a unique API key per simulated client. Secondarily, we should support a global, user-provided API key across all clients. I agree that we should also support API key authentication for the metrics store as suggested, but this is a bit orthogonal and requires changes to completely different code paths, so we can tackle that separately. I'll file a separate issue for this.

Since the primary use case entails Rally creating an API key per simulated client, it will need to be provided with credentials sufficient for calling the relevant Elasticsearch APIs for creating API keys and I'd argue also invalidating/deleting them as well upon benchmark completion.

This results in 3 different scenarios that we'll need to support:

  1. Rally creates a unique API key per simulated client. Each request submitted by a simulated client passes its unique generated API key as an argument to the relevant ES Python client method on a per-request basis. All other requests issued to the ES benchmark candidate--chiefly the creation and eventual deletion of these API keys--authenticate via basic auth.

  2. Same as above, but all other operations against the ES benchmark candidate authenticate via a user-provided API key instead of basic auth.

  3. All operations against the benchmark candidate ES cluster use a global, user-provided API key. To be clear, this means that every client uses the same API key, and Rally does not generate any API keys itself.

To support these scenarios, we need to introduce two new client options:

  • create_api_key_per_client
    • Boolean that instructs Rally to generate a unique API key per simulated client. These generated API keys are used for authentication on a per-request basis, overriding the global authentication mechanism.
  • global_api_key
    • Provides an API key to be used to authenticate and issue requests to the candidate ES cluster.
    • Used for all requests, unless create_api_key_per_client is also provided and set to true.
      • In that case, the global_api_key will be used only for administrative tasks, principally generating client API keys before the benchmark begins and deleting them after it is complete.
    • Mutually exclusive with the basic_auth_user and basic_auth_password client options. If a user provides both, it's an error condition.

Here's how each of the above scenarios would be configured using the two new client options:

Unique API key per client? Global auth Required client options Forbidden client options
Yes Basic create_api_key_per_client, basic_auth_user, basic_auth_password global_api_key
Yes API key create_api_key_per_client, global_api_key basic_auth_user, basic_auth_password
No API key global_api_key basic_auth_user, basic_auth_password

We can hash out the details around naming on the PR, but I'll mention here that global_api_key is just a placeholder and I'm happy to come up with something better if it'll clarify that it can actually be overridden by create_api_key_per_client.

Happy to hear any feedback on this approach!

Otherwise, the implementation will largely follow the broad strokes outlined in this comment. We'll aim to support scenario 1 first (generated API keys for client operations, basic auth otherwise) given that it's highest priority.

@michaelbaamonde michaelbaamonde changed the title Support ES API Keys Support ES API Keys for client operations May 16, 2022
@inqueue
Copy link
Member

inqueue commented May 16, 2022

+1 for this approach.

  1. Rally creates a unique API key per simulated client. Each request submitted by a simulated client passes its unique generated API key as an argument to the relevant ES Python client method on a per-request basis. All other requests issued to the ES benchmark candidate--chiefly the creation and eventual deletion of these API keys--authenticate via basic auth.

API keys created with basic authentication inherit permissions of the authenticated user, though they can be created with optional role descriptors (as mentioned in item 2 from the docs). To be clear, the intention as stated in #1067 (comment) is to create unique API keys per simulated client from a single authenticated user without additional or unique role descriptors?

+1 to deleting generated API keys, but preserving the global_api_key. Does it make sense to preserve generated API keys as a user preference for scenarios where running multiple benchmarks is desired, but the user wishes to reuse API keys already generated?

@michaelbaamonde
Copy link
Contributor

To be clear, the intention as stated in #1067 (comment) is to create unique API keys per simulated client from a single authenticated user without additional or unique role descriptors?

Yeah, that's what I'm thinking, at least for a first implementation. It's worth looking into how this is done by default by Agent, though, since the intention is to mimic its behavior. Good point.

Does it make sense to preserve generated API keys as a user preference for scenarios where running multiple benchmarks is desired, but the user wishes to reuse API keys already generated?

I could see this maybe being useful if it turns out that API key generation for a large number of clients is painfully slow, but it seems like a stretch. Did you have other scenarios in mind where this would be advantageous? It shouldn't be difficult to support this, but my leaning is to file it away as a future enhancement unless we foresee a compelling need for it. What do you think?

@pquentin
Copy link
Member

The approach sounds good to me too. I believe we should first focus on the first scenario in the table: generating per-client API keys with basic auth. I think it will provide 95% of the value since anything else won't be in the hot path.

@inqueue
Copy link
Member

inqueue commented May 18, 2022

Does it make sense to preserve generated API keys as a user preference for scenarios where running multiple benchmarks is desired, but the user wishes to reuse API keys already generated?

I could see this maybe being useful if it turns out that API key generation for a large number of clients is painfully slow, but it seems like a stretch. Did you have other scenarios in mind where this would be advantageous? It shouldn't be difficult to support this, but my leaning is to file it away as a future enhancement unless we foresee a compelling need for it. What do you think?

We can let this rest and see if a need surfaces. I do not see any compelling reason to add it now.

@michaelbaamonde
Copy link
Contributor

The approach sounds good to me too. I believe we should first focus on the first scenario in the table: generating per-client API keys with basic auth. I think it will provide 95% of the value since anything else won't be in the hot path.

100% agree. That's the plan!

@michaelbaamonde
Copy link
Contributor

michaelbaamonde commented May 18, 2022

I could use some input on a core implementation detail that I've been thinking about and have discussed a bit with @DJRickyB. In particular, I'd love to hear your thoughts @pquentin.

As discussed previously, a key implementation detail is that a client should be able to pass its unique API key to the Elasticsearch client transparently. This implies request-level overrides of the authentication mechanism set on the Elasticsearch client instance. As noted, elasticsearch-py introduced support for this in version 7.13.0. It requires simply providing an api_key kwarg to any client API method, which is then injected into the headers for the corresponding request, requiring no mutation of the Elasticsearch client instance.

My initial thinking was to rely on this mechanism. This ultimately requires that we make runners aware of the API key associated with its caller and ensure that it properly passes the api_key kwarg to whatever Elasticsearch client method(s) it invokes. Given the current Runner interface, the most obvious way to do this is to add an api_key key to the params dict that runners expect, and then ensure by some means that all runners include it in their ES client requests.

This works, but it does mean that runners and parameter sources would now need to concern themselves with authentication. For default runners in the Rally code, maybe this is okay, but this could be rather trappy for authors of custom runners and/or parameter sources. They would need to make sure that that api_key parameter is preserved and included in Elasticsearch client method calls. I don't think it's a good idea to introduce what's essentially a change to the Runner API in order for custom runners to use auto-generated API keys. It's surprising behavior, and it mixes concerns a bit.

Ideally, runners shouldn't need to care about ES client options such as authentication. They currently don't need to, but per-request auth overrides as required here complicate matters.

There's probably all sorts of ways that we could approach this within Rally. But! Conveniently, elasticsearch-py has introduced the .options() method on clients, which can be used to set transport options, including setting the api_key attribute per request (docs). This creates a new ES client instance that (in the case we're concerned with) overrides whatever auth was configured on the ES client instance that options() was called on and instead uses an API key.

This seems like exactly what we need. Instead of requiring that runners pass the api_key to the ES client, the client calling the runner would instead provide the runner with an ES client instance already configured with its unique API key. At a relatively high level, the key move would be to have AsyncExecutor call the .options() method on its ES client instance with the appropriate API key and pass the resulting instance to the execute_single function, which invokes the runner.

Done in this manner, I think all runners--including custom ones that Rally doesn't control--should Just Work(TM) with per-client API keys. It seems like the right separation of concerns to me.

The catch is that we'd need to update the elasticsearch-py dependency to at least 8.0.0, which will require some unknown but certainly non-zero amount of work due to deprecations and such. I'm also not sure if creating an ES client instance per Rally client would cause any issues with a large number of clients and introduce a potential bottleneck within the load driver. I'd think and hope not, but we'd need to verify.

We could probably hack this general approach into Rally itself using the 7.14.0 Python client without too much fuss (we already subclass elasticsearch.AsyncElasticsearch and could extend it, or we could more bluntly just create an ES client per simulated client as opposed to per worker), but we'll eventually want to upgrade anyway, and we'd might as well just do it now given that a need has arisen. The benefits seem worth it.

Assuming, of course, that this idea doesn't have a fatal flaw that I haven't considered. 😄 Hence my request for feedback. Thanks!

@pquentin
Copy link
Member

Done in this manner, I think all runners--including custom ones that Rally doesn't control--should Just Work(TM) with per-client API keys. It seems like the right separation of concerns to me.

💯 This is a great, and I absolutely agree that this separate concerns nicely.

Regarding upgrading the Elasticsearch Python client to 8.0, this is the right thing to do and long overdue, so it would be nice if we get it done, but I'm afraid "non-zero amount of work" is an understatement - this sounds like the same amount of work as this issue, if not more. Maybe you can try to work on it for a timeboxed period (1 week?) and if it turns out to be too much work, we can instead reimplement .options() ourselves until we upgrade.

@michaelbaamonde
Copy link
Contributor

michaelbaamonde commented May 19, 2022

I'm afraid "non-zero amount of work" is an understatement - this sounds like the same amount of work as this issue, if not more

I had a hunch it might be pretty involved, but hadn't yet looked in depth, so thanks for confirming that.

Maybe you can try to work on it for a timeboxed period (1 week?) and if it turns out to be too much work, we can instead reimplement .options() ourselves until we upgrade.

This seems reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants