Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WithContainerStepTest.stop and WithContainerStepTest.death #257

Merged
merged 2 commits into from
Jun 22, 2022

Conversation

dwnusbaum
Copy link
Member

@dwnusbaum dwnusbaum commented Jun 20, 2022

I saw these flake in #256 and didn't think much of it, but then I saw them also fail in #253, and took a look. There seem to be a lot of problems with our use of ps.

The real head scratcher for me though is that the tests pass locally with httpd:2.4.12. They only fail if I use httpd:2.4.35 or newer (which I believe is due to the switch to a -slim version of Debian in docker-library/httpd#113 which removes ps from the container entirely, but it is hard to track things down because that repo does not have tags/releases). If I do update to 2.4.35 or newer then I get the exact error messages seen in CI. Is something in the CI environment forcing newer images to be used? Is that even possible?

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@dwnusbaum
Copy link
Member Author

Let's rebuild and hope that WithContainerStepTest.stop fails this time.

@dwnusbaum dwnusbaum closed this Jun 20, 2022
@dwnusbaum dwnusbaum reopened this Jun 20, 2022
@dwnusbaum dwnusbaum closed this Jun 21, 2022
@dwnusbaum dwnusbaum reopened this Jun 21, 2022
@dwnusbaum
Copy link
Member Author

dwnusbaum commented Jun 21, 2022

It looks like the build might fail to record Linux test results, but WithContainerStep.test failed this time, and the error message in kill is java.io.IOException: failed to run ps. stderr: qemu: uncaught target signal 11 (Segmentation fault) - core dumped

@dwnusbaum dwnusbaum changed the title Attempt to diagnose/fix WithContainerStepTest.stop and .death Fix WithContainerStepTest.stop and WithContainerStepTest.death Jun 21, 2022
@dwnusbaum dwnusbaum requested a review from car-roll June 21, 2022 21:36
@dwnusbaum dwnusbaum marked this pull request as ready for review June 21, 2022 21:36
" withDockerContainer('httpd:2.4.12') {\n" +
" sh \"sleep 5; ps -e -o pid,command | egrep '${pwd tmp: true}/durable-.+/script.sh' | fgrep -v grep | sort -n | tr -s ' ' | cut -d ' ' -f2 | xargs kill -9\"\n" +
" withDockerContainer('httpd:2.4.54-alpine') {\n" +
" sh \"set -e; sleep 5; ps -e -o pid,args | egrep '${pwd tmp: true}/durable-.+/script.sh' | fgrep -v grep | sort -n | tr -s ' ' | cut -d ' ' -f2 | xargs kill -9\"\n" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this whole parsing looks so painful to me 😢
Would it be easier to test if the heartbeat file still existed/was being written to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is trying to check whether sh steps inside of withContainer respond correctly if the remote script just suddenly dies, so it is using internal knowledge of durable-task to kill all of the processes related to the sh step from within, which is pretty awkward. It used to be simpler before jenkinsci/durable-task-plugin#49, but it had to be updated in #121 since the pid file didn't exist anymore.

I think that if we deleted the heartbeat file here it wouldn't do anything, since the control script would just touch the file 3 seconds later, recreating it. Not 100% sure though. Maybe we could chmod it to be unwritable or something, but IDK if that would be much easier to maintain.

@dwnusbaum dwnusbaum merged commit ea5e5e8 into jenkinsci:master Jun 22, 2022
@dwnusbaum dwnusbaum deleted the test-fixes branch June 22, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants