Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref count search response bytes #103763

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Dec 29, 2023

Final step in #102030 ... actually makes SearchHit read a releasable bytes reference.
Does still fallback to copying to unrolled buffers here and there which can be removed in follow-ups where it's worth the effort (aggs being the most important one probably).

Hard to create very reliable benchmarks for this because all our macro-benchmarks are quite noisy. Running http logs and PMC though, there's a statistically significant reduction in GC and reduced tail latencies in most benchmarks.

The overhead for ref-counting these bytes isn't visible in profiling as far as I can tell and for large source values, no corresponding large byte[] are created any longer outside of the few remaining spots where we copy to pooled buffers.

closes #102657

original-brownbear and others added 30 commits November 30, 2023 17:05
It's in the title. Fix all the tests like we have everywhere else.
Fix the production use of SearchResponse in
BlobStoreCacheMaintenanceService by properly ref-counting the response
it holds on to.
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've created a changelog YAML for you.

private final RefCounted refCounted = LeakTracker.wrap(AbstractRefCounted.of(this::doClose));
private final AtomicBoolean closed = new AtomicBoolean(false);

private final Releasable releasable = LeakTracker.wrap(() -> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed plans on this one. Ref-counting was not appropriate here. We want to have this tracked by the context to be release as a last-resort but we also clearly know when this can be released earlier. So I just made it closable with the double-close noop so we can close it wherever.

@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've updated the changelog YAML for you.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew! what a read. This is awesome.

I eagerly await the performance numbers to see if there is a significant improvement.

It is frustrating to see how many non-test things require unPooled :(. But hopefully overtime those things can be updated to ensure we don't copy around bytes needlessly.

sortFields,
collapseField,
collapseValues,
hits.length == 0 ? ALWAYS_REFERENCED : LeakTracker.wrap(new AbstractRefCounted() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be ref counted if none of the SearchHit[] values are not. It seems weird to ref count for no reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, my reasoning for not adding that check would be that in production usage of this should effectively always work with ref-counted hirs and that condition would force another redundant iteration over the array on us.

@@ -112,7 +112,7 @@ public void testConsistentHitsWithSameSeed() throws Exception {
CoreMatchers.equalTo(0)
);
final int hitCount = response.getHits().getHits().length;
final SearchHit[] currentHits = response.getHits().getHits();
final SearchHit[] currentHits = response.getHits().asUnpooled().getHits();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we need to asUnpooled() these hits unless we drop into the if (finalI == 0) block below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but it's more code/change to make that conditional with how things work today. I'd deal with this in a follow-up where I was planning to restrict the explicit use of the array backing a SearchHits as much as possible.

resp.decRef();
}
context.addReleasable(resp::decRef);
fetchResults.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird to me, we add fetchResults to context.addReleasable in the FetchSearchPhase ctor, but then we call close on it here. So, are we just closing it early here and relying on the context for releasing in a failure scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly that was the plan.

@@ -448,6 +497,7 @@ public String getSourceAsString() {
* The source of the document as a map (can be {@code null}).
*/
public Map<String, Object> getSourceAsMap() {
assert hasReferences();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need references of sourceAsMap == null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, but I went for being strict here. We really shouldn't have code touching released instances ever. Much easier to trust tests if we verify that fact in more rather than fewer spots IMO

@@ -463,6 +513,7 @@ public Map<String, Object> getSourceAsMap() {
* The hit field matching the given field name.
*/
public DocumentField field(String fieldName) {
assert hasReferences();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need references here? DocumentField maps don't seem referenced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#103763 (comment) same reasoning here. I'd love to do a follow-up that expands ref counting to the fields in some cases as well.

@@ -653,13 +704,72 @@ public Map<String, Float> getMatchedQueriesAndScores() {
* @return Inner hits or <code>null</code> if there are none
*/
public Map<String, SearchHits> getInnerHits() {
assert hasReferences();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also assert that the inner hits objects have references?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not right? If we do that here, then we'd have to do it for the source itself before returning it and spots like that also? Seems like consistently asserting ref-counts on fields of ref-counted stuff is going to be a lot of code and doesn't add that much since whoever gets this map is probably going to do something with its contents anyway?

@@ -52,7 +52,7 @@ public InternalTopHits(
this.from = from;
this.size = size;
this.topDocs = topDocs;
this.searchHits = searchHits;
this.searchHits = searchHits.asUnpooled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day this shall be pooled as well! :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Comment on lines +73 to +77
for (SearchHit searchHit : searchHits) {
if (searchHit != null) {
searchHit.decRef();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@benwtrent
Copy link
Member

@original-brownbear could you post some numbers for a before/after of the benchmarks for this change?

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots to do yet. But this is awesome. I am mostly concerned about multi-node search testing, and CI should handle most of that. We just need to be vigilant.

…sponse-bytes

    # Please enter a commit message to explain why this merge is necessary,
    # especially if it merges an updated upstream into a topic branch.
    #
    # Lines starting with '#' will be ignored, and an empty message aborts
    # the commit.
@original-brownbear
Copy link
Member Author

Wee thanks so much Ben! Merging with Luca's blessing! I'll be on the lookout for leaks :)!

@original-brownbear original-brownbear merged commit 73a6840 into elastic:main Jan 17, 2024
14 of 15 checks passed
@original-brownbear original-brownbear deleted the ref-count-search-response-bytes branch January 17, 2024 15:16
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, Armin. Wrong PR review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] BasicDistributedJobsIT testMaxConcurrentJobAllocations failing
5 participants