Fix ref counting when using futures in AbstractClient #102498

original-brownbear · 2023-11-22T18:18:21Z

Found this now that #102030 is getting closer to completion. It's fundamentally broken how the client deals with ref counted messages. We were only saved by the fact that currently we do do not handle any messages with non-noop ref counting through this client interface.

We fundamentally need to increment the ref count by one before assigning a ref counted value to a future result.
Otherwise, transport actions will have to understand the specific kind of listener they resolve and cannot decrement sent/consumed transport messages themselves.

marking non-issue since this hasn't caused any production trouble yet.

We fundamentally need to increment the ref count by one before assigning a ref counted value to a future result. Otherwise, trasnsport actions will have to understand the specific kind of listener they resolve and cannot decrement sent/consumed transport messages themselves.

elasticsearchmachine · 2023-11-22T18:18:46Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2023-11-22T18:19:25Z

server/src/test/java/org/elasticsearch/client/internal/OriginSettingClientTests.java

@@ -37,7 +37,6 @@ protected <Request extends ActionRequest, Response extends ActionResponse> void
                    ActionListener<Response> listener
                ) {
                    assertEquals(origin, threadPool().getThreadContext().getTransient(ThreadContext.ACTION_ORIGIN_TRANSIENT_NAME));
-                    super.doExecute(action, request, listener);


Neither here nor in the other test do we care about resolving the listener. The result passed by the noop client is always null. I figured this was still better than adding a null check or hackily casting a non-null response in the noop-client ...

original-brownbear · 2023-11-22T18:20:30Z

server/src/main/java/org/elasticsearch/action/support/PlainActionFuture.java

+     * on the result before it goes out of scope.
+     * @param <R> reference counted result type
+     */
+    public static class ForRefCounted<R extends RefCounted> extends PlainActionFuture<R> {


We will probably need something similar for listenable future as well at some point. I didn't add any tests here yet before we're cool with the approach, I can though obviously it's trivial :)

DaveCTurner

Hmm I see the problem but this is a fairly awful hack IMO. That said I don't see great alternatives without somewhat invasive changes. The whole concept of blocking calls to the client is kinda bad, but does have production users (mostly Watcher and Monitoring it seems).

For a general-purpose utility I think the whole future should implement Releasable so that there's a proper incRef/decRef pair to keep things clear. As written here, there's an implicit assumption that there's only one call to .get() the result: if there are multiple .get() callers then they would need to coordinate the single required decRef() call amongst themselves. That seems super-trappy to me.

Also for a general-purpose utility we'd need to handle the case where the future is completed multiple times, in which we only incRef the result if it's the first completion.

Maybe it'd be better to make this specific behaviour local to the client rather than a general-purpose utility. I expect we've more chance of being able to assert that there's only a single call to .get() the result in that case.

Maybe also it should be a different API on the client for those few actions where refs matter so that it's clear to the caller they are taking a ref when calling .get(). Not 100% sure about that tho.

original-brownbear · 2023-11-22T21:09:38Z

@DaveCTurner I admit it, it's hacky :) How about this, I moved the custom future into the client to limit the blast radius and I promise I'll look into removing the production usage of blocking get?
I'd mostly like to get this through quickly and without a lot of effort because:

the blocking usage in prod must go away anyway
this is pretty urgent because it blocks the search ref counting work :)

DaveCTurner

Yep I can live with this for now to get the search response work through. I left a suggestion for a slightly safer implementation. Could you also add something to assert that .get() is only called once? And we need to do something with the timed .get() because a timeout would be a leak, although I'm not sure what exactly would be best there.

server/src/main/java/org/elasticsearch/client/internal/support/AbstractClient.java

DaveCTurner · 2023-11-23T07:28:19Z

Suggestion for a more general-purpose solution: #102507

original-brownbear · 2023-11-23T08:46:14Z

@DaveCTurner I'll have a look later :) But for now, wdyt about this solution? I made it safer with your suggestion and added an assertion about this only getting called once. Good enough to unblock this one? :)

DaveCTurner

Maybe I'm misreading something but it looks like you've got both impls now?

original-brownbear · 2023-11-23T09:40:59Z

I'm so sorry ... 🤦 cd4fdd5
guess the last 3 lines just became a noop and it still worked out randomly ...

DaveCTurner

LGTM

DaveCTurner · 2023-11-23T10:15:04Z

#102515 would have saved some of the mockery muckery I think

original-brownbear · 2023-11-23T10:19:57Z

Thanks David!

original-brownbear added >non-issue :Distributed/Network Http and internode communication implementations labels Nov 22, 2023

elasticsearchmachine added Team:Distributed Meta label for distributed team v8.12.0 labels Nov 22, 2023

original-brownbear commented Nov 22, 2023

View reviewed changes

original-brownbear requested a review from DaveCTurner November 22, 2023 18:20

DaveCTurner reviewed Nov 22, 2023

View reviewed changes

internal to the client

18c2b00

original-brownbear requested a review from DaveCTurner November 22, 2023 21:09

DaveCTurner reviewed Nov 22, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/client/internal/support/AbstractClient.java Outdated Show resolved Hide resolved

DaveCTurner mentioned this pull request Nov 23, 2023

Introduce ResponseLifetime #102507

Closed

assert only get once and safer ref counting

094e854

original-brownbear requested a review from DaveCTurner November 23, 2023 08:44

tests

8a63972

DaveCTurner reviewed Nov 23, 2023

View reviewed changes

jee

cd4fdd5

original-brownbear requested a review from DaveCTurner November 23, 2023 09:40

DaveCTurner approved these changes Nov 23, 2023

View reviewed changes

DaveCTurner mentioned this pull request Nov 23, 2023

Replace RefCountedFuture with something more robust #102514

Open

tests

12fd464

original-brownbear added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 23, 2023

elasticsearchmachine merged commit 659b236 into elastic:main Nov 23, 2023
13 checks passed

original-brownbear deleted the fix-client-ref-counting branch November 23, 2023 10:58

original-brownbear mentioned this pull request Nov 23, 2023

Rationalize ref-counting around ChannelActionListener #102551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ref counting when using futures in AbstractClient #102498

Fix ref counting when using futures in AbstractClient #102498

original-brownbear commented Nov 22, 2023

elasticsearchmachine commented Nov 22, 2023

original-brownbear Nov 22, 2023

original-brownbear Nov 22, 2023

DaveCTurner left a comment

original-brownbear commented Nov 22, 2023

DaveCTurner left a comment

DaveCTurner commented Nov 23, 2023

original-brownbear commented Nov 23, 2023

DaveCTurner left a comment

original-brownbear commented Nov 23, 2023 •

edited

Loading

DaveCTurner left a comment

DaveCTurner commented Nov 23, 2023

original-brownbear commented Nov 23, 2023

Fix ref counting when using futures in AbstractClient #102498

Fix ref counting when using futures in AbstractClient #102498

Conversation

original-brownbear commented Nov 22, 2023

elasticsearchmachine commented Nov 22, 2023

original-brownbear Nov 22, 2023

Choose a reason for hiding this comment

original-brownbear Nov 22, 2023

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

original-brownbear commented Nov 22, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 23, 2023

original-brownbear commented Nov 23, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

original-brownbear commented Nov 23, 2023 • edited Loading

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 23, 2023

original-brownbear commented Nov 23, 2023

original-brownbear commented Nov 23, 2023 •

edited

Loading