SecurityActionFilter performs async actions without retaining refs to requests #103952

DaveCTurner · 2024-01-05T07:59:45Z

An ActionRequest may hold nontrivial resources, implementing the RefCounted interface to ensure its resources are retained until no longer necessary at which point they are released promptly. Our internal APIs are generally refcount-neutral, acquiring refs as needed to keep resources alive beyond the scope of the executing method and releasing the acquired ref when no longer needed. In particular, async code must acquire a ref before going async and then release the ref once the async execution no longer needs it.

Today SecurityActionFilter does not respect this refcounting discipline: along some pathways it performs async actions without first acquiring a ref to the request. By the time the async action comes to being executed the request may already be closed.

This is not technically a bug today because in practice no ActionRequest instances hold nontrivial resources, but we need to change this (see e.g. #103356, #103161, #102778) and cannot do so correctly without pushing the right refcounting discipline through the security module. It would have been a bug if we hadn't noticed before 8.13 was released tho, see #103953.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-01-05T08:00:15Z

Pinging @elastic/es-security (Team:Security)

The security module doesn't refcount requests properly, see elastic#103952, so async auth flows may discard the keystore password before the request is processed. This commit adjusts the action to retain the request until the response is available, which unfortunately negates the improvement in elastic#103757.

jakelandis · 2024-01-05T15:13:26Z

Today SecurityActionFilter does not respect this refcounting discipline: along some pathways it performs async actions without first acquiring a ref to the request. By the time the async action comes to being executed the request may already be closed.

Can you elaborate ?

We do have some authc work that can go async such as https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java#L1283 but in practice that happens before the REST layer and cached in the threadcontext which is serialized between nodes (if found in thread context we don't go async). I am not familiar any authz work that can go async but there might be. So I think my question is if this is more about - in theory it can go async (which is true) ..or.. is it more in practice we need this because of X scenario. (what is X ?)

DaveCTurner · 2024-01-05T15:19:30Z

Interesting, I don't have a specific scenario indeed, I'm just reading the code and it looks like it has lots of places where it might do something async and I got lost trying to trace all the paths that might be called. If it really is all fully synchronous by the time it gets to running the action filter then this is not an issue.

jakelandis · 2024-01-05T16:13:22Z

For authc , we also hold on to a reference to the transport request via Authenticator.Context -> AuthenticationService.AuditableTransportRequest#AuditableTransportRequest and at least for APIKeys the context is referenced by the listener for the async function so I think for that specific we would be fine too. For authz, it is harder to think of a scenerio where could loose the reference to the transport request since that is in large part what we are authorizing.

I think explicit guarantees are nice, but still not sure if the lack of them would yield a bug (today). I also think it is impractical to rely on reading the code to assert this is not an issue (i.e. I only looked at one specific case and could be wrong) and the garuntee is warranted. However, I am not sure if refcounting all requests is the most elegant solution.

DaveCTurner · 2024-01-05T16:20:51Z

https://gradle-enterprise.elastic.co/s/i7btrggfvnccw/tests/overview?outcome=FAILED suggests there are indeed some situations where we do not finish running the SecurityActionFilter before returning to the caller (as detected by #103994)

`ActionFilter` implementations may be async, so we have to keep the request alive while the chain is running. Closes elastic#103952

DaveCTurner · 2024-01-05T17:57:36Z

Ok I think we can fix this in the ActionFilter framework itself, no need for changes to the security plugin. See #104000.

jakelandis · 2024-01-05T19:45:21Z

I think I had a misunderstanding of the issue before ... and still not sure I fully understand:

So...because the ActionRequest is a TransportRequest which is TransportMessage all action requests are ref counted but by default is a no-op implentation. Any ActionRequest that implements the ref counted message can execute some cleanup/close action likely via delegated ref counter such as the AbstractRefCounted.of() to execute some clean up when the reference count drops to 0 (or what ever value means same number of inc/dec).

However, If the SecurityActionFilter going async can cause a problem because the Action can finish before the async work kickoff via the SecurityActionFilter finishes ? If that happens the Request (via reference counting)is closed/cleaned up because the Action finishes first (or is concurrently executing and already refcounted down to 0)...but the async work via the ActionFilter still has reference to the now possibly closed/cleaned up Request object any maybe that async work goes boom. This is only possible if the SecurityActionFilter is either the only ActionFilter (it is not) or if it is the last ActionFilter in the chain (nondeterministic) else the filter chain would never progress until the listener is called and the count isn't increased to call the actual action. (i.e. an alternative fix could be to ensure SecurityActionFilter is always the first and never a solo filter to execute).

If this is the case, I don't see how the proposed ref counting in the chain progress fixes the issue. If the SecurityActionFilter is the last filter, couldn't the Action still execute concurrently or even complete before the async work kicked off from SecurityActionFilter is finished and maybe that async work go boom?

Also, it seems that having async work via the SecurityActionFilter seems to mandate that security bits are already in cached in thread context (to avoid going async) or that the the important security bits to authenticate/authorize the transport action happens before we go async. Else we are in a race condition between import security work executing concurrently with the action it was intended to secure.

DaveCTurner · 2024-01-05T19:51:14Z

because the Action can finish before the async work kickoff via the SecurityActionFilter finishes

Not quite, it's because the caller may release its ref to the request as soon as Client#execute returns, but the request needs to remain live until the action executes so someone needs to retain a ref until it gets to the action.

jakelandis · 2024-01-05T20:31:30Z

Not quite, it's because the caller may release its ref to the request as soon as Client#execute returns, but the request needs to remain live until the action executes so someone needs to retain a ref until it gets to the action.

I see now ... if any of the filters goes async then so does the remainder of the chain/action kickoff since the progression of the chain is handled via the handler (not a basic loop). So with async work via the filter(s) the control is returned to the caller immediately who could release the ref closing/cleaning up the Request before it used by the Action. Thanks for the additional explanation.

This is only possible if the SecurityActionFilter... or if it is the last ActionFilter in the chain (nondeterministic)

On second look, concurrent execution of Action and SecurityActionFilter is not possible since the order of ActionFilter's are actually deterministic (We sort them by hard coded order in ActionFilters) and SecurityActionFilter will execute first and it will never be the only ActionFilter. Thus SecurityActionFilter (order=Integer.MIN_VALUE) will ways execute first and the chain will be effectively blocked until the handler is called to progress the chain meaning the SecurityActionFilter and the Action are never running concurrently. Sorry for the noise.

DaveCTurner · 2024-01-05T20:37:23Z

concurrent execution of Action and SecurityActionFilter is not possible

++ I think that's true of all action filters too, not just the first in the chain. At least unless the action filter does some meaningful work after sending the request down the chain, but I don't think that's something that happens today. If a filter did that then it'd definitely need to implement its own refcounting because nobody else is tracking the lifecycle of that work any more. Something perhaps to bear in mind if we were to make audit logging less synchronous, but I think this is not a problem right now.

`ActionFilter` implementations may be async, so we have to keep the request alive while the chain is running. Closes #103952

DaveCTurner added >bug :Security/Security Security issues without another label labels Jan 5, 2024

elasticsearchmachine added the Team:Security Meta label for security team label Jan 5, 2024

DaveCTurner mentioned this issue Jan 5, 2024

Extend request lifecycle in RestReloadSecureSettingsAction #103953

Closed

DaveCTurner mentioned this issue Jan 5, 2024

WIP assert that SecurityActionFilter is always synchronous #103994

Closed

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jan 5, 2024

Retain ref to requests when running ActionFilterChain

51a5e9e

`ActionFilter` implementations may be async, so we have to keep the request alive while the chain is running. Closes elastic#103952

DaveCTurner mentioned this issue Jan 5, 2024

Retain ref to requests when running ActionFilterChain #104000

Merged

DaveCTurner closed this as completed in #104000 Jan 9, 2024

DaveCTurner added a commit that referenced this issue Jan 9, 2024

Retain ref to requests when running ActionFilterChain (#104000)

51521e2

`ActionFilter` implementations may be async, so we have to keep the request alive while the chain is running. Closes #103952

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SecurityActionFilter performs async actions without retaining refs to requests #103952

SecurityActionFilter performs async actions without retaining refs to requests #103952

DaveCTurner commented Jan 5, 2024 •

edited

Loading

elasticsearchmachine commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024 •

edited

Loading

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024 •

edited

Loading

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024

SecurityActionFilter performs async actions without retaining refs to requests #103952

SecurityActionFilter performs async actions without retaining refs to requests #103952

Comments

DaveCTurner commented Jan 5, 2024 • edited Loading

elasticsearchmachine commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024 • edited Loading

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024 • edited Loading

DaveCTurner commented Jan 5, 2024

jakelandis commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024

DaveCTurner commented Jan 5, 2024 •

edited

Loading

DaveCTurner commented Jan 5, 2024 •

edited

Loading

jakelandis commented Jan 5, 2024 •

edited

Loading