Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing huge number of users floods the heap #1029

Closed
Tracked by #1045
espen42 opened this issue May 3, 2022 · 5 comments
Closed
Tracked by #1045

Listing huge number of users floods the heap #1029

espen42 opened this issue May 3, 2022 · 5 comments
Assignees
Labels
Bug Something isn't working

Comments

@espen42
Copy link

espen42 commented May 3, 2022

It seems that listing (or other kinds of aggregation?) huge numbers of users causes the heap to fill up, probably via an only semi-handled issue in nashorn.

When researching the multiple instances at support ticket 5445 and looking into the heap dump from a full heap (causing the garbage collector to choke the CPU), we found several calls to <domain>/admin/tool/com.enonic.xp.app.users/main/_/service/com.enonic.xp.app.users/graphql with queries like:

{"query":"query ($query: String, $itemIds: [String], $start: Int, $count: Int) {\n                    types (query: $query, itemIds: $itemIds, start: $start, count: $count) {\n                        totalCount\n                        aggregations {\n                            name,\n                            buckets {\n                                key,\n                                docCount\n                            }\n                        }\n                    }\n                }"}

This in turn made calls to java classes concerned with the nashorn issue above, for example WeakReference and SoftReference - in essence, nashorn handles JS objects with a large number (10000+) of fields badly and causes an unreasonably large strain on the heap. Multiple simultaneous calls/threads make this even worse.

The customer has over 75 000 users in the repo. When I log in and enter their user app, and try to expand the "Members" item, the same graphql endpoint is called...

{"query":"query($idprovider: String, $types: [PrincipalType], $start: Int, $count: Int, $sort: String) {\n                  principalsConnection (idprovider: $idprovider, types: $types, start: $start, count: $count, sort: $sort) {\n                        totalCount\n                    }\n                }","variables":{"types":["USER"],"idprovider":"members"}}

...and fails ("canceled" in the network tab of dev tools, probably a timeout?). Note how start and count aren't filled in, so default (or unbounded?) values are most likely used.

It should be possible to reproduce this by creating ~100 000 random users and repeating the call.

  • Will it help to paginate these calls? @rymsha doubts it.
  • Should it be the app-users handling this at all? The graphql endpoint is still exposed after admin-console login (should it be?), so it would still be "internally vulnerable" to custom calls.
@espen42 espen42 assigned espen42 and alansemenov and unassigned espen42 May 3, 2022
@espen42
Copy link
Author

espen42 commented May 3, 2022

Consequences of this:
#779
#822

@alansemenov alansemenov assigned reisfmb and unassigned alansemenov May 4, 2022
@alansemenov alansemenov added the Bug Something isn't working label May 4, 2022
@alansemenov
Copy link
Member

@reisfmb we need a new query that uses exists to check whether idProvider has users and groups. this query should return true/false and be used instead of totalCount here

@sigdestad
Copy link
Member

AFAIK: Using totalCount should not have any big impact on server as long as "count = 0" - i.e. not fetching any items, just the totalCount. Do we still have to rewrite this?

@rymsha
Copy link
Contributor

rymsha commented May 5, 2022

... but not when count (size) is -1 which is the case after #779

Elasticsearch does have terminate_after search parameter, but we don't expose it, so my hopes to use "exists" are ruined.

Solution is to show "expand" arrow based on elements prefetched into the folder.

@sigdestad
Copy link
Member

Why not change this to regular pagination in UI? List 100 at a time, with load more option?
Can you record a deviation that we somehow set this to -1

alansemenov added a commit that referenced this issue May 23, 2022
* Fixed ListUserItemsRequest performance (#1035)

* Initial fix for GetPrincipalsTotalRequest high number of users bug (#1029)

* Fixed ListTypesRequest performance (#1034)

* Fixed ListUserItemsRequest and PrincipalLoader performances (#1046)

* Fixed ListUserItemsRequest performance (#1035)

* Added possibility of setting count to zero (#1050)

- Set some count args to zero

* Fixed members listing in statistics (#1048)

Adjusted Report position in statistics

Hide main group if doesn't contain description

* Codacy (#1048)

* update ui-tests for role/group statistics panel

* Added members count and load icon (#1061)

* update failed ui-tests

* Adjustments after review (#1061)

* update failed ui-tests

* update failed ui-tests

* Created and used predata in PrincipalComboBox loader (#1062)

* Adjustments after review (#1065)

Co-authored-by: Bruno Reis <reisfmb@gmail.com>
Co-authored-by: Bruno Reis <67838246+reisfmb@users.noreply.github.com>
Co-authored-by: sgauruseu <sig@eninic.com>
alansemenov added a commit that referenced this issue May 23, 2022
* Fixed ListUserItemsRequest performance (#1035)

* Initial fix for GetPrincipalsTotalRequest high number of users bug (#1029)

* Fixed ListTypesRequest performance (#1034)

* Fixed ListUserItemsRequest and PrincipalLoader performances (#1046)

* Fixed ListUserItemsRequest performance (#1035)

* Added possibility of setting count to zero (#1050)

- Set some count args to zero

* Fixed members listing in statistics (#1048)

Adjusted Report position in statistics

Hide main group if doesn't contain description

* Codacy (#1048)

* update ui-tests for role/group statistics panel

* Added members count and load icon (#1061)

* update failed ui-tests

* Adjustments after review (#1061)

* update failed ui-tests

* update failed ui-tests

* Created and used predata in PrincipalComboBox loader (#1062)

* Adjustments after review (#1065)

Co-authored-by: Bruno Reis <reisfmb@gmail.com>
Co-authored-by: Bruno Reis <67838246+reisfmb@users.noreply.github.com>
Co-authored-by: sgauruseu <sig@eninic.com>
(cherry picked from commit e436fda)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants