Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Nodes Stats API to SDK #812

Open
dbwiddis opened this issue Jun 7, 2023 · 0 comments
Open

[FEATURE] Add Nodes Stats API to SDK #812

dbwiddis opened this issue Jun 7, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@dbwiddis
Copy link
Member

dbwiddis commented Jun 7, 2023

Is your feature request related to a problem?

For performance testing, it is important to understand the CPU/Memory/Disk/Network tradeoffs.

These metrics are presently available for nodes in the OpenSearch cluster via the Nodes stats API

However, these metrics are not available for extensions run on a remote node. While EC2 provides some metrics, they do not provide sufficient detail for performance measurement. In particular:

  • from an external OS perspective, the Java process is consuming all the memory it has claimed for its heap, while the JVM has its own stats on the portion of the heap it has consumed, along with GC stats and other interesting tidbits.
  • whole-server CPU can be measured and if the only significant process consuming CPU is the extension this is an acceptable proxy, but this won't work for extension nodes hosting multiple extensions

What solution would you like?

Option 1: A whole new extension-specific stats API could be created, e.g., GET /_extensions/_uniqueId/stats. This may be preferable particularly if the extension only returns a subset of stats available on the OpenSearch API. One problem is this imposes a restriction on APIs that the extension itself may implement; we may want to reserve some prefix that implementers can avoid stepping on.

Option 2: The existing API GET /_nodes/<node_id>/stats could be modified to permit returning stats from an extension. If the extension has a node id that would be preferable, or we could use a specific text like "extension:uniqueid".

What alternatives have you considered?

The Java process for the extension could have JMX enabled. This opens a port and various processes can query that JMX port for information. See for example this post which runs a daemon on the server to query the JMX port and send the values to Cloudwatch. This unfortunately creates potential security issues with an open port on the extension.

The java process could be modified internally to create a daemon thread to log (configured) stats to a log file and/or inserted into an OpenSearch index.

Do you have any additional context?

I'm currently pursuing the alternative (JMX+collectd->cloudwatch) to keep performance testing unblocked, but I don't think it's the best long term solution, thus this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants