Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Mapping and Analysis Insights for the OpenSearch cluster #15102

Open
mgodwan opened this issue Aug 5, 2024 · 0 comments
Open
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance

Comments

@mgodwan
Copy link
Member

mgodwan commented Aug 5, 2024

Is your feature request related to a problem? Please describe

  1. When users create indices, they may choose to provide specific data types for fields (through mappings), such as date, scaled_float, etc. or for the documents they ingest, the field types may be auto-inferred using the dynamic mappings feature of OpenSearch.
  2. Users may associate analyzers and tokenizers with their fields which are then applied on the field values during indexing flow.

These features are helpful to tune what kind of output to get from the index in terms of query support, they may add extra processing to the cluster, which in turn may have unintended impact on performance. It would help to get visibility into the execution of these flows and measure their overhead, and determine performance impact due to these. It would also help users to see if they can change optimize mappings (e.g. text vs match_only_text vs keyword)/analysers (e.g. regex simplification, n-gram redcution)

With adding insights around mappings, analyzers, and tokenizers, we should be able to get a granular view around the time taken to perform various operations while trying to index a document. Request tracing, and newly added metrics framework provide a good interface to enable this, and adding observing decorators on top of this should help to solve for this.

Describe the solution you'd like

Use Telemetry framework to emit metrics. Few of the proposed metrics are (not exhaustive):

  1. Mapping Count by Data Type
  2. Analyzer Count
  3. Time taken by an analyzer implementation

Related component

Indexing:Performance

Describe alternatives you've considered

No response

Additional context

No response

@mgodwan mgodwan added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 5, 2024
@mgodwan mgodwan self-assigned this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance
Projects
None yet
Development

No branches or pull requests

2 participants