Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add a timeout option to file structure finder #34117

Merged
merged 3 commits into from
Sep 28, 2018

Conversation

droberts195
Copy link
Contributor

This can be used to restrict the amount of CPU a single
structure finder request can use.

The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.

The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.

This can be used to restrict the amount of CPU a single
structure finder request can use.

The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.

The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid the volatile reads in TimeoutChecker.check. Have to you evaluated if there is a performance impact?

I don't see anything wrong in this change however, I wonder if so many checks need to be made. The description mentions that the default timeout is 25s so it will reasonably timeout in under the ES default of 30s which provides 5s to play with.

"type" : "timeout_exception",
"reason" : "Aborting structure analysis during [delimited record parsing] as it has taken longer than the timeout of [1s]"
},
"status" : 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think timeout should be 500 error. Do you know what ES returns for timeouts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransportCloseJobAction throws IllegalStateException on timeout and TransportOpenJobAction throws ElasticsearchException. Both these map to status 500.

For an example outside of X-Pack, TransportRestoreSnapshotAction throws ProcessClusterEventTimeoutException, and this maps to 503. But the description of 503 says it's for temporary conditions which will be alleviated after some delay, so I'm not sure that's right for the file structure finder, because if an analysis is too time consuming first time it's likely to stay that way (at least for the same hardware).

Also, whoever originally wrote the ElasticsearchTimeoutException class had the opportunity to specify the most appropriate status for it and chose 500.

So although 500 seems suboptimal, it's probably the best there is and most consistent with the rest of ES.

/**
* Stops the timer if running.
*/
public void close() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing @Override

@@ -283,6 +308,7 @@ FileStructureFinder makeBestStructureFinder(List<String> explanation, String sam
String line;
while ((line = bufferedReader.readLine()) != null && ++lineCount <= maxLines) {
sample.append(line).append('\n');
timeoutChecker.check("sample line splitting");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a thin loop to be doing the check every time. There is a similar problem in DelimitedFileStructureFinder I don't want to get into premature optimisations but it would be nice if the check wasn't run every iteration. Possibly the overhead of the logic to check that would be heavier than the fast call to timeoutChecker.check(...)

Copy link
Contributor Author

@droberts195 droberts195 Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly the overhead of the logic to check that would be heavier

That's my assumption - see #34117 (comment)

I'll profile it to confirm.

if (grok.match(sampleMessage) == false) {
return false;
}
timeoutChecker.check("full message Grok pattern matching");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved outside the loop? How many sample messages does the call expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grok pattern matching is one of the most expensive operations. In Ingest pipeline a watchdog that's a little bit similar to the TimeoutChecker of this PR is installed to possibly interrupt every single cal to match():

With the structure finder's use of Grok that inner watchdog is the no-op watchdog, so we're not double checking for slow Grok patterns. But it still seems reasonable to check once per Grok pattern match given that it's such an expensive operation.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit f709c2f into elastic:master Sep 28, 2018
@droberts195 droberts195 deleted the structure_finder_timeout branch September 28, 2018 16:32
droberts195 added a commit that referenced this pull request Sep 28, 2018
This can be used to restrict the amount of CPU a single
structure finder request can use.

The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.

The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Sep 28, 2018
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 1, 2018
This change fixes a potential deadlock problem in the unit
test introduced in elastic#34117.

It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
droberts195 added a commit that referenced this pull request Oct 1, 2018
This change fixes a potential deadlock problem in the unit
test introduced in #34117.

It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
droberts195 added a commit that referenced this pull request Oct 1, 2018
This change fixes a potential deadlock problem in the unit
test introduced in #34117.

It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
kcm pushed a commit that referenced this pull request Oct 30, 2018
This can be used to restrict the amount of CPU a single
structure finder request can use.

The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.

The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
kcm pushed a commit that referenced this pull request Oct 30, 2018
This change fixes a potential deadlock problem in the unit
test introduced in #34117.

It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants