Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement on indices.fielddata.cache.size #59829

Closed
boicehuang opened this issue Jul 18, 2020 · 3 comments
Closed

Enhancement on indices.fielddata.cache.size #59829

boicehuang opened this issue Jul 18, 2020 · 3 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@boicehuang
Copy link
Contributor

boicehuang commented Jul 18, 2020

Currently, indices.fielddata.cache.size limits the max memory of field data cache by continuously discarding the least recently used one(LRU), defaults to unbounded. Also, indices.breaker.fielddata.limit is a limit of field data cache by refusing query when the active field data cache memory reached its limit, defaults to 40%.

When we continuously send sorting on or computing aggregations on the _id or text field, the field data cache memory first reaches 40%. Indices.breaker.fielddata.limit begins to refuse every new query. The problem is that field data cache can't do the LRU. It cannot be recovered by itself until I send _cache/clear.

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [fielddata] Data too large, data for [_id] would be [579151518/552.3mb], which is larger than the limit of [513304166/489.5mb]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:98) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.limit(ChildMemoryCircuitBreaker.java:185) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:138) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData$PagedBytesEstimator.beforeLoad(PagedBytesIndexFieldData.java:231) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:110) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:52) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$0(IndicesFieldDataCache.java:145) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:142) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:68) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.mapper.IdFieldMapper$IdFieldType$1$1.load(IdFieldMapper.java:239) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource.getValues(BytesRefFieldComparatorSource.java:71) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource$2.getBinaryDocValues(BytesRefFieldComparatorSource.java:117) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.FieldComparator$TermValComparator.getLeafComparator(FieldComparator.java:888) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:180) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:136) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:40) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.query.CancellableCollector.getLeafCollector(CancellableCollector.java:51) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:276) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:394) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:438) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.access$100(SearchService.java:128) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:403) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:399) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1151) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.2.jar:6.8.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.8.0_181]
        at java.lang.Thread.run(Unknown Source) ~[?:1.8.0_181]

However, if we adjust indices.fielddata.cache.size to 38%, when the field data cache grows, new queries will trigger the cache to do the LRU. It expires the least recently used cache and works well with the new query. Also, indices.breaker.fielddata.limit can take effect on the sudden growth of the field data cache.

So, I suppose indices.fielddata.cache.size should be less than indices.breaker.fielddata.limit by default and better to add some tips on both settings. I am pleased to do some works.

@boicehuang boicehuang added >enhancement needs:triage Requires assignment of a team area label labels Jul 18, 2020
@cbuescher cbuescher added the :Search/Search Search-related issues that do not fall into other categories label Jul 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 22, 2020
@cbuescher cbuescher removed the needs:triage Requires assignment of a team area label label Jul 22, 2020
@jtibshirani
Copy link
Contributor

So, I suppose indices.fielddata.cache.size should be less than indices.breaker.fielddata.limit by default and better to add some tips on both settings.

This makes sense to me, and would help avoid situations where the cache must be cleared manually. I'll discuss with the team to confirm we want to make a change.

@jtibshirani
Copy link
Contributor

jtibshirani commented Nov 2, 2020

We discussed as a team and decided to not adjust the default setting or recommendation for indices.fielddata.cache.size. The reasoning is that field data is very expensive to build -- a set-up where it's continuously being evicted and rebuilt would result in unacceptably slow searches. So in the default case, we'd rather alert users to the problem instead of performing LRU eviction.

However we have some changes planned to improve the situation:

Thanks @boicehuang for filing this issue, it revived an important discussion around simplifying fielddata cache management.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants