Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better defaults for DocumentSubsetBitsetCache #49260

Closed
jpountz opened this issue Nov 18, 2019 · 3 comments · Fixed by #50535
Closed

Better defaults for DocumentSubsetBitsetCache #49260

jpountz opened this issue Nov 18, 2019 · 3 comments · Fixed by #50535
Assignees
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC

Comments

@jpountz
Copy link
Contributor

jpountz commented Nov 18, 2019

Initially the DLS cache was unbounded. This was not due to us being careless, but rather because Lucene makes the assumption that accessing live docs is cheap, and so we didn't it to run in linear time on cache miss. Unfortunately the ability to use templated role queries means you can have an unbounded number of bit sets in the cache, so we had to rollback this decision and move to a size-bound cache in order to avoid out-of-memory errors. We got some feedback that this has considerably slowed down some clusters, and the default of 50MB that this cache may use is way too low.

We should probably move to a number that is more generous, and also depends on the available memory on the node, e.g. 10%?

I wonder whether we should also decrease the default TTL. I think the TTL is important on this cache because it helps avoid keeping that cache full all the time in case some role queries are rarely used, which is probably typical when templated role queries are used. But the current default of 1 week means that someone running only one query every week would always use memory for this cache that might be better spent elsewhere. What about decreasing this default to maybe a couple hours?

@jpountz jpountz added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC labels Nov 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authorization)

@tvernum tvernum self-assigned this Nov 19, 2019
@tvernum
Copy link
Contributor

tvernum commented Nov 19, 2019

Broadly there's 2 options for cache sizing

  1. make them relatively small, but with a long TTL so that it caters for objects that are used occasionally
  2. make them large, but with a shorter TTL so that it caters for objects that are used multiple times over a short period, but then quickly release that memory.

(Small with a short TTL is also an option, but is not particularly helpful, and Large with a long ttl just ends up holding too much memory for too long).

When adding this cache, I opted for approach (1) based on the premise that some users/roles would only be used a few times a week.
I think in retrospect that was the wrong choice. This cache is most helpful when indices are large and calculating live-docs is relatively slow. The current defaults are so small that it sometimes cannot hold a single bitset for a large index and is effectively useless.

I think it makes sense to switch the defaults to large-weight+short-ttl because that matches the use cases where the cache is most needed.

@jpountz
Copy link
Contributor Author

jpountz commented Nov 19, 2019

+1

tvernum added a commit that referenced this issue Jan 13, 2020
The Document Level Security BitSet Cache (see #43669) had a default
configuration of "small size, long lifetime". However, this is not
a very useful default as the cache is most valuable for BitSets that
take a long time to construct, which is (generally speaking) the same
ones that operate over a large number of documents and contain many
bytes.

This commit changes the cache to be "large size, short lifetime" so
that it can hold bitsets representing billions of documents, but
releases memory quickly.

The new defaults are 10% of heap, and 2 hours.

This also adds some logging when a single BitSet exceeds the size of
the cache and when the cache is full.

Resolves: #49260
tvernum added a commit to tvernum/elasticsearch that referenced this issue Jan 14, 2020
The Document Level Security BitSet Cache (see elastic#43669) had a default
configuration of "small size, long lifetime". However, this is not
a very useful default as the cache is most valuable for BitSets that
take a long time to construct, which is (generally speaking) the same
ones that operate over a large number of documents and contain many
bytes.

This commit changes the cache to be "large size, short lifetime" so
that it can hold bitsets representing billions of documents, but
releases memory quickly.

The new defaults are 10% of heap, and 2 hours.

This also adds some logging when a single BitSet exceeds the size of
the cache and when the cache is full.

Resolves: elastic#49260
Backport of: elastic#50535
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
The Document Level Security BitSet Cache (see elastic#43669) had a default
configuration of "small size, long lifetime". However, this is not
a very useful default as the cache is most valuable for BitSets that
take a long time to construct, which is (generally speaking) the same
ones that operate over a large number of documents and contain many
bytes.

This commit changes the cache to be "large size, short lifetime" so
that it can hold bitsets representing billions of documents, but
releases memory quickly.

The new defaults are 10% of heap, and 2 hours.

This also adds some logging when a single BitSet exceeds the size of
the cache and when the cache is full.

Resolves: elastic#49260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants