Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecurityActionFilter performs async actions without retaining refs to requests #103952

Closed
DaveCTurner opened this issue Jan 5, 2024 · 10 comments · Fixed by #104000
Closed

SecurityActionFilter performs async actions without retaining refs to requests #103952

DaveCTurner opened this issue Jan 5, 2024 · 10 comments · Fixed by #104000
Labels
>bug :Security/Security Security issues without another label Team:Security Meta label for security team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jan 5, 2024

An ActionRequest may hold nontrivial resources, implementing the RefCounted interface to ensure its resources are retained until no longer necessary at which point they are released promptly. Our internal APIs are generally refcount-neutral, acquiring refs as needed to keep resources alive beyond the scope of the executing method and releasing the acquired ref when no longer needed. In particular, async code must acquire a ref before going async and then release the ref once the async execution no longer needs it.

Today SecurityActionFilter does not respect this refcounting discipline: along some pathways it performs async actions without first acquiring a ref to the request. By the time the async action comes to being executed the request may already be closed.

This is not technically a bug today because in practice no ActionRequest instances hold nontrivial resources, but we need to change this (see e.g. #103356, #103161, #102778) and cannot do so correctly without pushing the right refcounting discipline through the security module. It would have been a bug if we hadn't noticed before 8.13 was released tho, see #103953.

@DaveCTurner DaveCTurner added >bug :Security/Security Security issues without another label labels Jan 5, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Jan 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jan 5, 2024
The security module doesn't refcount requests properly, see elastic#103952, so
async auth flows may discard the keystore password before the request is
processed. This commit adjusts the action to retain the request until
the response is available, which unfortunately negates the improvement
in elastic#103757.
@jakelandis
Copy link
Contributor

Today SecurityActionFilter does not respect this refcounting discipline: along some pathways it performs async actions without first acquiring a ref to the request. By the time the async action comes to being executed the request may already be closed.

Can you elaborate ?

We do have some authc work that can go async such as https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java#L1283 but in practice that happens before the REST layer and cached in the threadcontext which is serialized between nodes (if found in thread context we don't go async). I am not familiar any authz work that can go async but there might be. So I think my question is if this is more about - in theory it can go async (which is true) ..or.. is it more in practice we need this because of X scenario. (what is X ?)

@DaveCTurner
Copy link
Contributor Author

Interesting, I don't have a specific scenario indeed, I'm just reading the code and it looks like it has lots of places where it might do something async and I got lost trying to trace all the paths that might be called. If it really is all fully synchronous by the time it gets to running the action filter then this is not an issue.

@jakelandis
Copy link
Contributor

For authc , we also hold on to a reference to the transport request via Authenticator.Context -> AuthenticationService.AuditableTransportRequest#AuditableTransportRequest and at least for APIKeys the context is referenced by the listener for the async function so I think for that specific we would be fine too. For authz, it is harder to think of a scenerio where could loose the reference to the transport request since that is in large part what we are authorizing.

I think explicit guarantees are nice, but still not sure if the lack of them would yield a bug (today). I also think it is impractical to rely on reading the code to assert this is not an issue (i.e. I only looked at one specific case and could be wrong) and the garuntee is warranted. However, I am not sure if refcounting all requests is the most elegant solution.

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Jan 5, 2024

https://gradle-enterprise.elastic.co/s/i7btrggfvnccw/tests/overview?outcome=FAILED suggests there are indeed some situations where we do not finish running the SecurityActionFilter before returning to the caller (as detected by #103994)

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jan 5, 2024
`ActionFilter` implementations may be async, so we have to keep the
request alive while the chain is running.

Closes elastic#103952
@DaveCTurner
Copy link
Contributor Author

Ok I think we can fix this in the ActionFilter framework itself, no need for changes to the security plugin. See #104000.

@jakelandis
Copy link
Contributor

jakelandis commented Jan 5, 2024

I think I had a misunderstanding of the issue before ... and still not sure I fully understand:

So...because the ActionRequest is a TransportRequest which is TransportMessage all action requests are ref counted but by default is a no-op implentation. Any ActionRequest that implements the ref counted message can execute some cleanup/close action likely via delegated ref counter such as the AbstractRefCounted.of() to execute some clean up when the reference count drops to 0 (or what ever value means same number of inc/dec).

However, If the SecurityActionFilter going async can cause a problem because the Action can finish before the async work kickoff via the SecurityActionFilter finishes ? If that happens the Request (via reference counting)is closed/cleaned up because the Action finishes first (or is concurrently executing and already refcounted down to 0)...but the async work via the ActionFilter still has reference to the now possibly closed/cleaned up Request object any maybe that async work goes boom. This is only possible if the SecurityActionFilter is either the only ActionFilter (it is not) or if it is the last ActionFilter in the chain (nondeterministic) else the filter chain would never progress until the listener is called and the count isn't increased to call the actual action. (i.e. an alternative fix could be to ensure SecurityActionFilter is always the first and never a solo filter to execute).

If this is the case, I don't see how the proposed ref counting in the chain progress fixes the issue. If the SecurityActionFilter is the last filter, couldn't the Action still execute concurrently or even complete before the async work kicked off from SecurityActionFilter is finished and maybe that async work go boom?

Also, it seems that having async work via the SecurityActionFilter seems to mandate that security bits are already in cached in thread context (to avoid going async) or that the the important security bits to authenticate/authorize the transport action happens before we go async. Else we are in a race condition between import security work executing concurrently with the action it was intended to secure.

@DaveCTurner
Copy link
Contributor Author

because the Action can finish before the async work kickoff via the SecurityActionFilter finishes

Not quite, it's because the caller may release its ref to the request as soon as Client#execute returns, but the request needs to remain live until the action executes so someone needs to retain a ref until it gets to the action.

@jakelandis
Copy link
Contributor

Not quite, it's because the caller may release its ref to the request as soon as Client#execute returns, but the request needs to remain live until the action executes so someone needs to retain a ref until it gets to the action.

I see now ... if any of the filters goes async then so does the remainder of the chain/action kickoff since the progression of the chain is handled via the handler (not a basic loop). So with async work via the filter(s) the control is returned to the caller immediately who could release the ref closing/cleaning up the Request before it used by the Action. Thanks for the additional explanation.

This is only possible if the SecurityActionFilter... or if it is the last ActionFilter in the chain (nondeterministic)

On second look, concurrent execution of Action and SecurityActionFilter is not possible since the order of ActionFilter's are actually deterministic (We sort them by hard coded order in ActionFilters) and SecurityActionFilter will execute first and it will never be the only ActionFilter. Thus SecurityActionFilter (order=Integer.MIN_VALUE) will ways execute first and the chain will be effectively blocked until the handler is called to progress the chain meaning the SecurityActionFilter and the Action are never running concurrently. Sorry for the noise.

@DaveCTurner
Copy link
Contributor Author

concurrent execution of Action and SecurityActionFilter is not possible

++ I think that's true of all action filters too, not just the first in the chain. At least unless the action filter does some meaningful work after sending the request down the chain, but I don't think that's something that happens today. If a filter did that then it'd definitely need to implement its own refcounting because nobody else is tracking the lifecycle of that work any more. Something perhaps to bear in mind if we were to make audit logging less synchronous, but I think this is not a problem right now.

DaveCTurner added a commit that referenced this issue Jan 9, 2024
`ActionFilter` implementations may be async, so we have to keep the
request alive while the chain is running.

Closes #103952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Security Security issues without another label Team:Security Meta label for security team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants