[RFC] Paginating _cat APIs #15013

gargharsh3134 · 2024-07-29T22:24:43Z

Is your feature request related to a problem? Please describe

The _cat APIs in opensearch (such as _cat/indices, _cat/shards...) which are primarily used for monitoring and operational purposes, are both CPU and Memory intensive and thereby consume significant resources. As the cluster size increases (number of nodes, shards and indices), the usage of _cat APIs start to adversely impact the cluster. For large clusters, these APIs not only put the cluster's availability at risk, but their non-paginated responses make it difficult for the clients to consume their correspondingly larger response sizes with increased latencies.

The proposal to overcome such issues, is to paginate these APIs which would help in limiting both the response size and resource consumption (by not aggregating stats or information of all the queried elements at once).

Describe the solution you'd like

Proposal is to implement token based pagination.

Requirements:

The paginated responses should provide nextToken and previousToken in the response body as parameters to be used for querying the next and previous page respectively.
Users can pass nextToken (to be consumed from the response) and maxPageSize as query parameters. Number of entries in response should adhere to maxPageSize and thereby should always be less than or equal to user provided maxPageSize.
The behaviour around next and previous page should be clearly defined in case of APIs which do not snapshot/store the cluster's state at the time of first paginated query. For instance, _cat/indices API will always use point in time information from the current cluster state and build response accordingly.
IFF, the entries in paginated response are sorted in a particular order, user should be able to define the sort order of such entries. For instance, if paginated response of _cat/indices API has list of indices ordered according to their creation time, users should be able to define the order, be it ascending or descending.

Approach:

Introducing new V2 APIs for which default behaviour is paginated responses.
As supporting pagination in existing APIs as a default behaviour is a breaking change, the proposal is to instead
introduce new V2 APIs. The existing APIs can then be deprecated (say in opensearch version 3.x).
For e.g. curl "localhost:9200/_cat/indices/V2"
Introducing new feature flag (say largeClusterModeEnabled) which if set to true, the existing APIs will fail fast with a
validation error that non-paginated queries are not supported for large clusters.

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

shwetathareja · 2024-08-01T05:43:29Z

curl "localhost:9200/_cat/indices/V2"

the url path can be curl "localhost:9200/_cat/indices/paginate" so that usage is explicit

shwetathareja · 2024-08-01T06:40:13Z

Introducing new feature flag (say largeClusterModeEnabled) which if set to true, the existing APIs will fail fast with a
validation error that non-paginated queries are not supported for large clusters.

@gargharsh3134 Please expand on this more with examples around _cat/segments and _cat/recovery, _cat/snapshots etc. would filtering be mandatory? how are the thresholds setup?

gargharsh3134 · 2024-08-01T10:59:27Z

the url path can be curl "localhost:9200/_cat/indices/paginate" so that usage is explicit

@shwetathareja Since for this new path, we will anyway require a new RestAction, was thinking of keeping it more generic. As we are aligned towards deprecating the existing ones and making pagination as the default behaviour, having pagination related keywords in the URL seemed bit restrictive. That being said, i'm open to changing the path. Please let me know, if you still feel, pagination related keyword in path is required.
Thanks!

sandeshkr419 · 2024-08-07T21:36:27Z

I don't like the idea of calling this as V2 API and deprecating default V1 in future.
Here is what I propose:

The url path can be: localhost:9200/_cat/indices/paginate=true to begin with. The default localhost:9200/_cat/indices will be paginate=false

A feature flag (largeClusterModeEnabled) or an equivalent cluster level setting can then dictate the default behavior of paginate=true/false, this can be discussed independent of this feature. In that way for a large cluster, user will not have to figure out whether to paginate or not.

The largeClusterModeEnabled flag can then be used to dictate the default behavior with other APIs as well which are resource extensive, for example deciding the verbosity of _nodes/stats.

In this way, a user will not have to worry about which variant of an API to use and this (should ideally?) comes with no learning curve for the user. Also, the user will have the option to force a particular behavior by setting pagination to true/false. My point is that a user should just set the large cluster behavior once and then let OpenSearch take care of the underlying logic that we use for all APIs then.

dblock · 2024-08-12T15:54:07Z

The problem with feature flags that toggle default behavior is that there's no way to know for an instance of OpenSearch which flavor will be enabled. Thus your client has to be aware of what options the server has, and building a single client that works against both flavors is now impossible.

I recommend adding the flag, defaulting it to false, then flipping it in the next major version.

dblock · 2024-08-12T15:54:59Z

Another thought, if pageSize is specified, paginate, otherwise don't (current behavior)?

gargharsh3134 · 2024-08-12T16:27:58Z

@dblock Thanks for taking a look. Please find my responses for the 2 queries below:

The problem with feature flags that toggle default behavior is that there's no way to know for an instance of OpenSearch which flavor will be enabled. Thus your client has to be aware of what options the server has, and building a single client that works against both flavors is now impossible. I recommend adding the flag, defaulting it to false, then flipping it in the next major version.

The proposal is to keep the flag disabled by default, and the existing queries can work as is. Users can enable it if required, if done so, each API can honour its pre-defined fail-fast mechanism (for e.g, _cat/indices can define, querying for more than 100 indices is not supported). Though, I'm yet to come up with a complete design on how can a common strategy be implemented for the APIs requiring to onboard to this fail-fast mechanism.

Another thought, if pageSize is specified, paginate, otherwise don't (current behavior)?

The scenarios around using query params as an identifier were evaluated (say user explicitly passing next_token as null to start a paginated query), but dropped because of paginated APIs having a different response structure than what exists today for _cat APIs. Since, changing a query param should not lead to different response structures and formats, it sort of points towards using a new API URL.
A more detailed view of the response and requests is captured as part of individual proposals (for _cat/indices -> #14258).

dblock · 2024-08-12T20:58:49Z

The proposal is to keep the flag disabled by default, and the existing queries can work as is.

I understand, but please see my reasoning of having this flag at all. Clients will not be able to adapt easily to an API that behaves differently behind a feature-flag.

shwetathareja · 2024-08-19T05:04:50Z

I understand, but please see my reasoning of having this flag at all. Clients will not be able to adapt easily to an API that behaves differently behind a feature-flag.

@dblock in large cluster mode when there are too many nodes and shards, the admin APIs may just continue to timeout and stress the nodes in the cluster. This largeClusterModeEnabled flag helps to fail fast and throw an exception. Just that this wouldntbe timeout exception rather a validation exception. Ideally there will be no change needed in the client.

dblock · 2024-08-19T23:11:10Z

@shwetathareja maybe we miscommunicated, I am referring to the following:

A feature flag (largeClusterModeEnabled) or an equivalent cluster level setting can then dictate the default behavior of paginate=true/false, this can be discussed independent of this feature. In that way for a large cluster, user will not have to figure out whether to paginate or not.

Here fail fast is not what's proposed, it's a different API behavior.

shwetathareja · 2024-08-20T14:29:22Z

Here fail fast is not what's proposed, it's a different API behavior.

Got it @dblock, yeah not going ahead with different API behavior based on feature flag.

gargharsh3134 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 29, 2024

github-actions bot added the Cluster Manager label Jul 29, 2024

gargharsh3134 mentioned this issue Jul 29, 2024

[META] Paginating _cat APIs #15014

Open

3 tasks

rwali-aws added v2.17.0 and removed untriaged labels Aug 2, 2024

andrross added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Paginating _cat APIs #15013

[RFC] Paginating _cat APIs #15013

gargharsh3134 commented Jul 29, 2024 •

edited

Loading

shwetathareja commented Aug 1, 2024 •

edited

Loading

shwetathareja commented Aug 1, 2024

gargharsh3134 commented Aug 1, 2024 •

edited

Loading

sandeshkr419 commented Aug 7, 2024 •

edited

Loading

dblock commented Aug 12, 2024

dblock commented Aug 12, 2024

gargharsh3134 commented Aug 12, 2024

dblock commented Aug 12, 2024

shwetathareja commented Aug 19, 2024 •

edited

Loading

dblock commented Aug 19, 2024 •

edited

Loading

shwetathareja commented Aug 20, 2024

[RFC] Paginating _cat APIs #15013

[RFC] Paginating _cat APIs #15013

Comments

gargharsh3134 commented Jul 29, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Requirements:

Approach:

Related component

Describe alternatives you've considered

Additional context

shwetathareja commented Aug 1, 2024 • edited Loading

shwetathareja commented Aug 1, 2024

gargharsh3134 commented Aug 1, 2024 • edited Loading

sandeshkr419 commented Aug 7, 2024 • edited Loading

dblock commented Aug 12, 2024

dblock commented Aug 12, 2024

gargharsh3134 commented Aug 12, 2024

dblock commented Aug 12, 2024

shwetathareja commented Aug 19, 2024 • edited Loading

dblock commented Aug 19, 2024 • edited Loading

shwetathareja commented Aug 20, 2024

gargharsh3134 commented Jul 29, 2024 •

edited

Loading

shwetathareja commented Aug 1, 2024 •

edited

Loading

gargharsh3134 commented Aug 1, 2024 •

edited

Loading

sandeshkr419 commented Aug 7, 2024 •

edited

Loading

shwetathareja commented Aug 19, 2024 •

edited

Loading

dblock commented Aug 19, 2024 •

edited

Loading