Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search] Async search backing off strategy makes dashboards slow when searches take less than 1s to be served #157837

Closed
Tracked by #95776
dej611 opened this issue May 16, 2023 · 5 comments · Fixed by #164957
Assignees
Labels
enhancement New value added to drive a business result Feature:Search Sessions Feature:Search Querying infrastructure in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Graph) Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@dej611
Copy link
Contributor

dej611 commented May 16, 2023

Describe the bug:

While debugging a slow dashboard I've realised that most of the waiting time was due to the async search backing off polling strategy.
Each ES search query time was within 1s, many around 500ms, but often the first request was served back from the server within 150ms: this triggered the first 1000ms delay for a new polling request, so the final Kibana reported time is can be even more than 2x the ES response timing.

In this example the final timing is the sum of the 2 requests + the 1s delay, taking more than 1.5s while ES reports less than 300ms to serve the full respose:

Screenshot 2023-05-16 at 10 01 25
1684224058942 - 1684224058484 = 458ms (ES time 105ms) // Initial async search
1684224060231 - 1684224059958 = 273ms (ES time 247ms) // second polling request

458 + 273 + 1000 = 1731ms // <= the final reported time

Maybe we could improve this aspect of the polling strategy?
One idea could be providing an additional step for the initial polling delay of 500ms for the first second?
On the other hand, discussing with @martijnvg offline, ES could take a little bit longer (from 100ms to ~200ms) to serve the first request in order to catch all "faster" searches and reduce the polling for these. Perhaps the two could be adopted together in some form?

cc @ppisljar @lukasolson

@dej611 dej611 added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Graph) Feature:Search Sessions :Search/Search labels May 16, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@lukasolson lukasolson added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label May 25, 2023
@stratoula stratoula added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Jun 19, 2023
@henrikno
Copy link
Contributor

Rel: #143277

@dej611 dej611 self-assigned this Jul 20, 2023
@dej611
Copy link
Contributor Author

dej611 commented Jul 28, 2023

Had a go with an investigation on this, but couldn't find a the real offender of the performance problem.
First of all a diagram of the architecture in place:

flowchart TB
  SearchInterceptor[Search Interceptor w/ Polling] --"Network Request"--> bFetch.onBatchItem
  subgraph CLIENT
    subgraph DATA PLUGIN
      SearchInterceptor[Search Interceptor w/ Polling]
    end
    Expression --> SearchInterceptor
    subgraph DASHBOARD
      subgraph EMBEDDABLE
        subgraph LENS
            Chart --> Expression
        end
      end
    end
  end
  ESESearch --"Network Request"--> Elasticsearch
  subgraph KIBANA [SERVER KIBANA]
    subgraph DATA PLUGIN
      bFetch.onBatchItem --> ESESearch[ESE Search Strategy w/ Polling]
      ESESearch <--> UiSettings
    end
  end
  subgraph ES [SERVER ES]
    Elasticsearch
  end
Loading

While at the beginning I thought uiSettingsClient was a bottleneck I've found that it was not the case as long as get calls happen not concurrently (not await Promise.all([ .... ])) as there's a caching wrapper around the client which leverages the fact that get calls are in series.
The Observable structure makes a bit hard to debug/profile the exact timings and order of each function as there's a lot of noise in the profile stacktrace.

It was still not clear to me where most of the time is lost. Measuring timings from both Kibana client and server side it looks like at least a 1s overhead is spent somewhere else than the actual request handling, and I assume the network delay in that case to be minimal as both were running locally (the ES cluster was remote to simulate a slow request).
So perhaps more investigation is due here to better understand where the actual time is spent.

@lukasolson
Copy link
Member

Would this be solved by increasing the wait_for_completion_timeout to something like 1s?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Search Sessions Feature:Search Querying infrastructure in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Graph) Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants