Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search requests resource tracking framework #1179

Open
tushar-kharbanda72 opened this issue Aug 31, 2021 · 8 comments
Open

Search requests resource tracking framework #1179

tushar-kharbanda72 opened this issue Aug 31, 2021 · 8 comments
Labels
enhancement Enhancement or improvement to existing feature or request

Comments

@tushar-kharbanda72
Copy link
Contributor

tushar-kharbanda72 commented Aug 31, 2021

Is your feature request related to a problem? Please describe.

#1042 aims to build back-pressure support for Search requests. This framework will act as a basic building block for building an effective search back-pressure mechanism.

Describe the solution you'd like

Build a resource tracking framework for search requests (queries), which tracks resource consumption on OpenSearch nodes, for various Search operations at different levels of granularity -

i. Individual Search Request (Rest) - On the Coordinator Node across phases (such as search and query phase) for end to end resource tracking from coordinator perspective.
ii. Shard Search Requests (Transport) - On the Data Node per phase, for discrete search task tracking.
iii. Shard level Aggregated View - Total consumption of resources mapped to every shard for searches on the node.
iv. Node level Aggregated View - Total consumption of resources for all search request on the node.

Characteristics:

  • Resources to track for each of the above proposed metrics

    • Memory Consumption (JVM) : On the similar lines of #1009 we want to enable memory tracking.
    • CPU Consumption : TBD
  • Frequent checkpointing and footprint update during the progress of a search phase: The resource tracking should continuously be done as the search request progresses, and need not necessarily wait for a particular phase to complete. This is important as a single phase execution can consume significant amount of resources itself. So, tracking resources within a phase itself as it progresses becomes important for these metrics to represent the accurate state on the node.

  • Data Node feedback to build Coordinator state : Have a capability for data node to piggyback on the response and send its current shard utilisation state to coordinator. This can later feed into coordinator state to take adaptive and short circuit routing decisions (covered as point 2 below).

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@tushar-kharbanda72
Copy link
Contributor Author

Got time this week to think more on this. Sharing my high level thoughts about the solution I'm thinking:

If we want to have E2E tracking of a search requests resource utilisation we have 2 major things to handle:

  • Track all memory allocations when search request is executing (Work done on Search ThreadPool across different threads). Should work for all Shard Search phases and Search request (Coordinator) as well.
  • Track the response overhead (payload received from data nodes) while the request is in progress on coordinator. This overhead of serialising bytes and construct response object is taken on Transport Worker thread.

Tracking allocations on Search ThreadPool

  1. Making TaskId of the request available in Thread Context so that even when the request is handled by different threads - they should have these details. This can be made a part of transient headers. This can be done from TaskManager when a Task is created and registered.

  2. TaskResourceTracker: Create a TaskResourceTracker entity where all tasks are registered which needs resource tracking support. With each task additional optional meta data can be added like action type, shard/index name/id etc. Once the task is picked up by different threads those threads are then registered under the task for which they will be executing the work. And once the task completes that task will also get unregistered from TaskResourceTracker. So, it’ll hold a structure like Map<TaskInfo, List> - Implementation used would be of ConcurrentHashMap to support high throughput

  3. ResourceTrackingRunnable: Decorate the runnable which will get executed on Search ThreadPool (can be added to other ThreadPools as well if required). The responsibility of this runnable would be to register the thread in TaskResourceTracker under a task id (which’ll be available in Thread Context). While registering the thread it’ll capture the heap allocations by the thread at that time and CPU time of the thread. Once the runnable execution completes/errors out then it’ll capture the heap allocations and CPU time again for the thread and send this update to TaskResourceTracker that this thread has finished processing.

  4. ResourceWatcher: This will be a scheduled task which will be responsible to take snapshots at regular intervals (5 secs). What it’ll do is it’ll update the heap allocation and CPU time for each thread registered in TaskResourceTracker (not the ones which have finished). Once done then it’ll create views using that data. For eg: Memory allocations per node/shard/index/ThreadPool/actionType etc which ever required.

Without resource watcher we'll only get resource utilisation info once the task/thread completes which wouldn't be helpful as task is about to leave the node and don't need resources. Also, if there's a rogue query for which a phase is consuming a lot of resources and taking longer - We should know that which the phase is in progress so that if required a shortcutting decision can be made.

Tracking response overhead on coordinator

In InboundHandler while we’re constructing Java Object from bytes we’ll be doing allocations on heap. We track the overhead of creating the response object. At that time we don’t know the task id so we’ll map the overhead against the response object address and as soon as the thread context is restored and we have the thread id available we’ll move that response overhead to be tracked under that task only. And if there are any objects within the response which have delayed initialisation those will be constructed which being executed on Search ThreadPool so will get tracked automatically.

These metrics can exposed via stats API. Full details can be discussed once there’s alignment on this high level proposal.

These 2 should give us much needed visibility into resource utilisation due to Search requests on a node.

@tushar-kharbanda72
Copy link
Contributor Author

I need to run a performance benchmark to see if it hurts the performance under normal/high load. Will create a patch and should be able to get some results on this by next week.

@asafm
Copy link

asafm commented Mar 2, 2022

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

@tushar-kharbanda72
Copy link
Contributor Author

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

Thanks for showing interest in this feature @asafm . We're working on a Task resource tracking framework where you can get resource consumption for any task running on a cluster which addresses not only search requests but all sorts of requests we want to track. We're code complete and currently in review phase and trying to this feature out with OpenSearch 2.0 release.

First PR for initial frame: #2089

@tushar-kharbanda72
Copy link
Contributor Author

This PR completes the initial task resource tracking framework. Users can get insights into resource consumption of tasks running on the cluster by using list tasks API. List Tasks API refreshes the resource consumption info before returning response so that it is accurate.

Users can further use X-Opaque-Id to get insights into how much resources their queries are using on different nodes.

curl --location --request GET 'http://127.0.0.1:9200/_tasks?actions=*'

{
    "nodes": {
        "JsXVdDkXRAOg3m3v6NNhrA": {
            "tasks": {
                "JsXVdDkXRAOg3m3v6NNhrA:74": {
                    "node": "JsXVdDkXRAOg3m3v6NNhrA",
                    "id": 74,
                    "type": "direct",
                    "action": "indices:data/read/search[phase/query]",
                    "description": "shardId[[test-index][0]]",
                    "start_time_in_millis": 1648563530779,
                    "running_time_in_nanos": 3777544037,
                    "cancellable": true,
                    "parent_task_id": "JsXVdDkXRAOg3m3v6NNhrA:73",
                    "headers": {},
                    "resource_stats": {
                        "total": {
                            "cpu_time_in_nanos": 6429000,
                            "memory_in_bytes": 307424
                        }
                    }
                }
            }
        }
    }
}

@tushar-kharbanda72
Copy link
Contributor Author

This current issue is now only for the Task resource tracking framework

@dblock
Copy link
Member

dblock commented Apr 25, 2022

The PR in #3046 was reverted.

@alexahorgan
Copy link

Demo feedback (8/3/22):

Outcome:
Approved, ship it.

Action Items/Follow up:

  • Partner with documentation for use cases for release, consider creating a series of blog posts.
  • Identify current priorities for future query experiences with a PM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

4 participants