Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI Problem] CI node out of memory #9705

Closed
manupak opened this issue Dec 10, 2021 · 4 comments
Closed

[CI Problem] CI node out of memory #9705

manupak opened this issue Dec 10, 2021 · 4 comments
Labels
priority: medium type:ci Relates to TVM CI infrastructure

Comments

@manupak
Copy link
Contributor

manupak commented Dec 10, 2021

Examples :

This started with a timeout on https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/173/pipeline.

It seems like the CI jobs starts with past images (maybe others are aware of this). Should we clean them ?

Possible resolutions for both TVM's Jenkins file and daily-docker-image-rebuild Jenkins file:

  • Doing the cleanup_docker_image("ci_cpu") before the build is started
  • Do a full prune and pull the docker images from the hub

Any other ideas to proceed are also welcome!

cc : @leandron @tqchen @areusch @u99127

@leandron
Copy link
Contributor

I temporarily disabled this machine on Jenkins, due to storage issues, FYI.
https://ci.tlcpack.ai/computer/octo.aws.c4.44.242.35.246/

@manupak
Copy link
Contributor Author

manupak commented Dec 13, 2021

Thanks! @leandron

@areusch
Copy link
Contributor

areusch commented Dec 14, 2021

from Noah:

  • docker prune script on static nodes (which I have running as a cron job) doesn't pass the 'volumes' flag
  • manually fixed 5 nodes which were full (think 246 was one of them)
  • permanent fix requires a rolling update of the fleet to bring in a new AMI. no time for this right now.

let's leave this bug open til we do the rolling update.

@leandron
Copy link
Contributor

leandron commented Sep 8, 2022

Cleaning up old tickets that are solved, so I'm closing this one.

@leandron leandron closed this as completed Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: medium type:ci Relates to TVM CI infrastructure
Projects
None yet
Development

No branches or pull requests

3 participants