Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ai ml tests to successfully run for long duration on another VM #1147

Merged
merged 70 commits into from
Jun 22, 2023

Conversation

sethiay
Copy link
Collaborator

@sethiay sethiay commented May 29, 2023

Description

Upgrade ai ml tests to successfully run for long duration on VM other than the VM on which the kokoro tests are triggered. This is required because of intermittent issues that comes while running long duration tests on Kokoro VM directly.

Notes for reviewer:

  1. I have used sudo gcloud instead of gcloud because gcloud is giving errors while doing ssh from Kokoro VM and I couldn't find a solution.
  2. I have changed the base image for Pytorch dino model because the nvidia one doesn't have gsutil preinstalled.
  3. I will take care of to-dos before merging.
  4. To-do: Take care of deleting logs for older build in GCS buckets, similar to periodic perf tests.
  5. The existing build.sh files for both tf and pytorch are moved under scripts/ml_tests/tf/resnet/ and scripts/ml_tests/pytorch/dino by the name setup_host_and_run_model.sh

Link to the issue in case of a bug fix.

NA

Testing details

  1. Manual - Tested the setup for tf model with less number of epochs. To-do/In progress: Test the setup for pytorch and with relatively higher number of epochs.
  2. Unit tests - NA
  3. Integration tests - NA

@sethiay sethiay merged commit ce0a93d into master Jun 22, 2023
3 checks passed
@sethiay sethiay deleted the ai_ml_tests branch May 14, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants