Query timeout ignored #3627

rdeaton · 2013-09-05T14:09:16Z

Hi again,

I have a timeout specified in all of my queries to prevent some slow queries from backing up the queue, but it doesn't seem to be working at the moment. I have many, many entries in my slowlog anywhere from 2s to 22s for queries with a timeout of 1500ms specified. Here's a snippet of one.

[2013-09-05 13:35:32,967][WARN ][index.search.slowlog.query] [qdave] [quizlet][0] took[6.3s], took_millis[6311], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[10], source[{"fields":[],"from":0,"size":"50","timeout":"1500ms", ...

As always, anything else I can provide which will help debug this?

rdeaton · 2013-09-05T14:10:14Z

Perhaps I am actually misunderstanding and the timeout bound forces the hits to be returned, but does not stop the execution of the query, in which case I suppose my question is, how difficult is it to add functionality to do the latter?

javanna · 2013-09-11T12:27:42Z

Indeed the timeout doesn't kill the running query, but that doesn't mean it is ignored. Perhaps you were looking for something like #2929, adding the ability for actually killing a query when the timeout expires? What do you think @rdeaton?

javanna · 2013-10-05T14:57:50Z

Closing this one. The current timeout mechanism is a best effort, which might not work with all the queries. The idea behind #2929 is to try and improve this. I'd suggest to watch that issue if you are interested, or reopen this one if you meant something else.

markharwood · 2014-08-01T08:32:17Z

See also #4586

aalexgabi · 2016-06-06T16:58:31Z

Does Elasticsearch kill a long running query if you set the timeout in the client? Here I'm not talking about killing the TCP connection but actually stopping the long running query from bringing down the node.

Is there a way to set the timeout globally for the cluster (for example in /etc/elasticsearch/elasticsearch.yml)?

If any of these features exist please specify the version in which they have been introduced.

jasontedor · 2016-06-06T17:15:53Z

Does Elasticsearch kill a long running query if you set the timeout in the client?

Timeouts are achieved on a best-effort basis. Timeouts are pushed down to the shard and impact the query phase. Certain things like highlighting and rewriting are not impacted by the timeout.

Is there a way to set the timeout globally for the cluster (for example in /etc/elasticsearch/elasticsearch.yml)?

Yes. There is search.default_search_timeout which is available starting in 2.0.0. This has the same caveat as above, it is best-effort.

dylanwenzlau · 2017-06-22T05:23:55Z

Having this feature work more accurately is useful beyond #2929. We are trying to use this to stop a query after a certain amount of time so that our search product can always return in a given amount of time. We could make a hacky solution that runs the query async, and then we would stop waiting after a certain amount of time, but the query would then continue to use resources on the elasticsearch server.

Our feature request would be basically an equivalent to mysql's MAX_EXECUTION_TIME. Unfortunately, right now Elasticsearch's timeout parameter can be inaccurate by several orders of magnitude, rendering it useless for our purpose. Our tests have been run against Elasticsearch 5.3

markharwood · 2017-06-23T16:24:39Z

Elasticsearch's timeout parameter can be inaccurate by several orders of magnitude

Ensuring timely timeouts is a case of weaving timer checks into all the "hot loops". We've done this for many e.g. the loop for collecting the next doc in a result set. Can you give an example of the problem query that overruns so we can see what un-checked loop might be involved?

dylanwenzlau · 2017-06-23T19:25:50Z

Makes sense.

Here is an example where I specify a 10ms timeout, and the query finishes in 1361ms.

markharwood · 2017-06-23T22:12:58Z

Note there is a node-level setting thread_pool.estimated_time_interval that dictates the resolution of the timer used to estimate current time for these sorts of checks (it avoids making timer checks too expensive in hot loops). The default resolution of this is 200ms meaning you can expect overruns of this magnitude but obviously your example is still greater than this.

vipul-mykaarma · 2020-11-23T18:18:57Z

any updates on this ?

ghost assigned javanna Sep 6, 2013

javanna closed this as completed Oct 5, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query timeout ignored #3627

Query timeout ignored #3627

rdeaton commented Sep 5, 2013

rdeaton commented Sep 5, 2013

javanna commented Sep 11, 2013

javanna commented Oct 5, 2013

markharwood commented Aug 1, 2014

aalexgabi commented Jun 6, 2016

jasontedor commented Jun 6, 2016

dylanwenzlau commented Jun 22, 2017

markharwood commented Jun 23, 2017

dylanwenzlau commented Jun 23, 2017

markharwood commented Jun 23, 2017

vipul-mykaarma commented Nov 23, 2020

Query timeout ignored #3627

Query timeout ignored #3627

Comments

rdeaton commented Sep 5, 2013

rdeaton commented Sep 5, 2013

javanna commented Sep 11, 2013

javanna commented Oct 5, 2013

markharwood commented Aug 1, 2014

aalexgabi commented Jun 6, 2016

jasontedor commented Jun 6, 2016

dylanwenzlau commented Jun 22, 2017

markharwood commented Jun 23, 2017

dylanwenzlau commented Jun 23, 2017

markharwood commented Jun 23, 2017

vipul-mykaarma commented Nov 23, 2020