Enhancement on indices.fielddata.cache.size #59829

boicehuang · 2020-07-18T17:14:59Z

Currently, indices.fielddata.cache.size limits the max memory of field data cache by continuously discarding the least recently used one(LRU), defaults to unbounded. Also, indices.breaker.fielddata.limit is a limit of field data cache by refusing query when the active field data cache memory reached its limit, defaults to 40%.

When we continuously send sorting on or computing aggregations on the _id or text field, the field data cache memory first reaches 40%. Indices.breaker.fielddata.limit begins to refuse every new query. The problem is that field data cache can't do the LRU. It cannot be recovered by itself until I send _cache/clear.

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [fielddata] Data too large, data for [_id] would be [579151518/552.3mb], which is larger than the limit of [513304166/489.5mb]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:98) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.limit(ChildMemoryCircuitBreaker.java:185) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:138) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData$PagedBytesEstimator.beforeLoad(PagedBytesIndexFieldData.java:231) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:110) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:52) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$0(IndicesFieldDataCache.java:145) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:142) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:68) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.mapper.IdFieldMapper$IdFieldType$1$1.load(IdFieldMapper.java:239) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource.getValues(BytesRefFieldComparatorSource.java:71) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource$2.getBinaryDocValues(BytesRefFieldComparatorSource.java:117) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.FieldComparator$TermValComparator.getLeafComparator(FieldComparator.java:888) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:180) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:136) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:40) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.query.CancellableCollector.getLeafCollector(CancellableCollector.java:51) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471) ~[lucene-core-7.7.0.jar:7.7.0 2e5ba3199e7e69fd7af6759b52f3907771a2a467 - boicehuang - 2019-10-22 11:13:10]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:276) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:394) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:438) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService.access$100(SearchService.java:128) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:403) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:399) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1151) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-6.8.2.jar:6.8.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.2.jar:6.8.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.8.0_181]
        at java.lang.Thread.run(Unknown Source) ~[?:1.8.0_181]

However, if we adjust indices.fielddata.cache.size to 38%, when the field data cache grows, new queries will trigger the cache to do the LRU. It expires the least recently used cache and works well with the new query. Also, indices.breaker.fielddata.limit can take effect on the sudden growth of the field data cache.

So, I suppose indices.fielddata.cache.size should be less than indices.breaker.fielddata.limit by default and better to add some tips on both settings. I am pleased to do some works.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-07-22T08:58:23Z

Pinging @elastic/es-search (:Search/Search)

jtibshirani · 2020-08-28T00:06:49Z

So, I suppose indices.fielddata.cache.size should be less than indices.breaker.fielddata.limit by default and better to add some tips on both settings.

This makes sense to me, and would help avoid situations where the cache must be cleared manually. I'll discuss with the team to confirm we want to make a change.

jtibshirani · 2020-11-02T21:47:00Z

We discussed as a team and decided to not adjust the default setting or recommendation for indices.fielddata.cache.size. The reasoning is that field data is very expensive to build -- a set-up where it's continuously being evicted and rebuilt would result in unacceptably slow searches. So in the default case, we'd rather alert users to the problem instead of performing LRU eviction.

However we have some changes planned to improve the situation:

Explore time-based eviction for field data, specifically targeting the time-series use case. (Time-based evictions for ordinal maps #59852)
Reduce and eventually remove the need for field data entirely, including removing support for sorting + aggregating on _id. (Remove on-heap fielddata. #64612)

Thanks @boicehuang for filing this issue, it revived an important discussion around simplifying fielddata cache management.

boicehuang added >enhancement needs:triage Requires assignment of a team area label labels Jul 18, 2020

cbuescher added the :Search/Search Search-related issues that do not fall into other categories label Jul 22, 2020

elasticmachine added the Team:Search Meta label for search team label Jul 22, 2020

cbuescher removed the needs:triage Requires assignment of a team area label label Jul 22, 2020

jtibshirani added the team-discuss label Aug 28, 2020

wylieconlon mentioned this issue Oct 29, 2020

[DOCS] Clarify field data cache behavior #64375

Merged

jtibshirani removed the team-discuss label Nov 2, 2020

jtibshirani closed this as completed Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement on indices.fielddata.cache.size #59829

Enhancement on indices.fielddata.cache.size #59829

boicehuang commented Jul 18, 2020 •

edited

Loading

elasticmachine commented Jul 22, 2020

jtibshirani commented Aug 28, 2020

jtibshirani commented Nov 2, 2020 •

edited

Loading

Enhancement on indices.fielddata.cache.size #59829

Enhancement on indices.fielddata.cache.size #59829

Comments

boicehuang commented Jul 18, 2020 • edited Loading

elasticmachine commented Jul 22, 2020

jtibshirani commented Aug 28, 2020

jtibshirani commented Nov 2, 2020 • edited Loading

boicehuang commented Jul 18, 2020 •

edited

Loading

jtibshirani commented Nov 2, 2020 •

edited

Loading