Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Test Target issues in extended.perf openj9 test run on aix #2514

Open
adamfarley opened this issue Apr 15, 2021 · 0 comments
Open

Various Test Target issues in extended.perf openj9 test run on aix #2514

adamfarley opened this issue Apr 15, 2021 · 0 comments

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Apr 15, 2021

Describe the bug
This is a brain-dump bug containing several issues seen in a single aix extended.perf run over the weekend.

These issues will be listed below, in order of occurrence, and issues raised in the correct repositories.

  1. There was a stack overflow during execution of the renaissance-als_0 test target, which looks the same as this one. (Will reopen and annotate.)
    Note: the job then correctly executed the next test target to be run, which was renaissance-chi-square_0
  2. Later in the job, another test target (renaissance-dec-tree_0) failed with another stack overflow. However, at the end of this test target's section, it strangely declared that renaissance-als_0 had failed.
  3. The job then proceeded to execute renaissance-chi-square_0, which had already run earlier. This implies that not only had the test declared the failure of the wrong test target, but some internal error had reset the progression of test targets to the renaissance-als_0 position. Issue raised.
    Note: This pattern continued for the rest of the tests. A random renaissance test target would hit an infrequent stack overflow error, said test target would be incorrectly declared to be renaissance-als_0, and the following test targets to be executed would imply that the progression through the test targets had been reset to the start of renaissance-chi-square_0 again.
  4. TRSS correctly identified that there were many instances of renaissance-chi-square_0, but incorrectly declared them to have all failed. Also, the contents of the multiple failures were all from the single instance of renaissance-chi-square_0 which had failed. Issue raised.

To Reproduce
Rerun in Grinder link

Expected behavior
I expect these test targets to execute once in such a scenario, not execute multiple times; resetting their progression every time a failure is encountered.

Screenshots
https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_extended.perf_ppc64_aix/15/

Additional context
While the TRSS failure is perhaps more excusable (garbage in, garbage out), I'm still raising a TRSS issue to see if the TRSS crew feel that it should be more resilient against this particular problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants