[CI] XPackRestIT timed out after 30 mins #52428

astefan · 2020-02-17T16:24:29Z

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_RUNTIME_JAVA=java11,nodes=general-purpose/516/console
https://gradle-enterprise.elastic.co/s/frzrlcgeo45j4

But this occurred in a scenario where one of tests it seems it left a task behind - p0=transform/transforms_stats/Test get continuous transform stats - and all the subsequent tests to spend 10 seconds in assertBusy and, thus, to add up and go over the 30 minutes timeout. Still, opening this issue for a second pair of eyes over this and given the history with yaml tests timing out lately.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-02-17T16:24:31Z

Pinging @elastic/es-core-infra (:Core/Infra/Build)

droberts195 · 2020-02-17T16:49:33Z

Many tests wait for pending tasks to complete during their cleanup phase so that tasks started by one test do not continue to run during the next test and cause hard-to-reproduce failures in that subsequent test.

As you say, this means that if one test leaves a rogue task behind that never completes then it fails every subsequent test that waits for pending tasks and also makes the test suite take an eternity to complete.

So maybe we should change the "wait for pending tasks" test cleanup functionality such that if the wait times out it instantly fails the entire test suite. It's just a question of how to do this in such a way that:

The test for which the "wait for pending tasks" timed out is reported as a failure
Subsequent tests are ideally not reported as failures, but as skipped

hendrikmuhs · 2020-02-18T16:50:03Z

The problem seems to be the test that failed before, which is the problem described in #52429

The task leakage happens because after the failure, the test does not continue execution and therefore does not stop the task.

Independent of #52429 it raises the question why execution continues after the failure. Because after the root cause, task cleanup does not happen, therefore all subsequent tests have the leaked task. I see 2 solutions:

fail fast: do not continue further test execution
cleanup all transform tasks as part of @After, e.g. rollup does this

Conceptually I like "fail fast", however there is a reason why 2 has been implemented at least for rollup. I like to learn about the reason.

@astefan I am not understanding the links below the first failure. 7.6 should not be affected by this issue. If I follow the links the builds look fine to me.

astefan · 2020-02-18T16:53:17Z

@hendrikmuhs you are right, those links are not needed. Will remove them from the initial description.

hendrikmuhs · 2020-02-19T09:59:22Z

created a PR for #52429, this should fix the failure.

I keep this issue open to clarify the desired behavior when a test fails and whether it should stop immediately or not, see #52428 (comment)

rjernst · 2020-02-25T05:26:49Z

@hendrikmuhs I think we generally agree we should fail fast. IMO the problem is the cleanup is being done as normal test actions. Instead, could it be done as actual cleanup? Currently each yml file has a single setup/cleanup that is run before/after each test section. If the cleanup of these can't go into that cleanup section, maybe we need to consider a dedication teardown within each test section.

hendrikmuhs · 2020-02-28T13:56:28Z

IMO the problem is the cleanup is being done as normal test actions. Instead, could it be done as actual cleanup? Currently each yml file has a single setup/cleanup that is run before/after each test section.

@rjernst Yes, that's a good point and should be possible, I have to move the test into a separate file but this also in the interest of cleaner test separation. I will implement your suggestion.

relates elastic#52428

restructure transform yaml tests to run cleanup in teardown phase relates #52428

rjernst · 2020-06-24T16:37:02Z

While XPackRestIT still suffers from timeouts on MacOS, this failure was broader and fixed a while ago.

astefan added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Feb 17, 2020

astefan mentioned this issue Feb 17, 2020

[CI] XPackRestIT "transform/transforms_stats/Test get continuous transform stats" failure #52429

Closed

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 28, 2020

restructure transform yaml tests to run cleanup in teardown phase

9832f05

relates elastic#52428

hendrikmuhs mentioned this issue Feb 28, 2020

[Transform] restructure transform yaml tests #52956

Merged

rjernst added the team-discuss label Mar 2, 2020

hendrikmuhs pushed a commit that referenced this issue Mar 3, 2020

[Transform] restructure transform yaml tests (#52956)

98b4e8c

restructure transform yaml tests to run cleanup in teardown phase relates #52428

hendrikmuhs pushed a commit that referenced this issue Mar 3, 2020

[Transform] restructure transform yaml tests (#52956)

844f350

restructure transform yaml tests to run cleanup in teardown phase relates #52428

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

rjernst removed the team-discuss label Jun 24, 2020

rjernst closed this as completed Jun 24, 2020

mark-vieira added Team:Delivery Meta label for Delivery team and removed Team:Core/Infra Meta label for core/infra team labels Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] XPackRestIT timed out after 30 mins #52428

[CI] XPackRestIT timed out after 30 mins #52428

astefan commented Feb 17, 2020 •

edited

Loading

elasticmachine commented Feb 17, 2020

droberts195 commented Feb 17, 2020

hendrikmuhs commented Feb 18, 2020 •

edited

Loading

astefan commented Feb 18, 2020

hendrikmuhs commented Feb 19, 2020

rjernst commented Feb 25, 2020

hendrikmuhs commented Feb 28, 2020

rjernst commented Jun 24, 2020

[CI] XPackRestIT timed out after 30 mins #52428

[CI] XPackRestIT timed out after 30 mins #52428

Comments

astefan commented Feb 17, 2020 • edited Loading

elasticmachine commented Feb 17, 2020

droberts195 commented Feb 17, 2020

hendrikmuhs commented Feb 18, 2020 • edited Loading

astefan commented Feb 18, 2020

hendrikmuhs commented Feb 19, 2020

rjernst commented Feb 25, 2020

hendrikmuhs commented Feb 28, 2020

rjernst commented Jun 24, 2020

astefan commented Feb 17, 2020 •

edited

Loading

hendrikmuhs commented Feb 18, 2020 •

edited

Loading