Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add e2e test for tune api with LLM hyperparameter optimization #2420

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Commits on Sep 3, 2024

  1. add e2e test for tune api

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    6be7f29 View commit details
    Browse the repository at this point in the history
  2. upgrade training-operator sdk

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    1a1f119 View commit details
    Browse the repository at this point in the history
  3. specify the version of training operator sdk

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    8461a49 View commit details
    Browse the repository at this point in the history
  4. fix num_labels error and update the version of training operator cont…

    …roller
    
    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c860238 View commit details
    Browse the repository at this point in the history
  5. check the version of training operator

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    216ebd9 View commit details
    Browse the repository at this point in the history
  6. debug

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    f6b96f5 View commit details
    Browse the repository at this point in the history
  7. check import path of HuggingFaceModelParams

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c636493 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. update the version of training operator sdk

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    8180422 View commit details
    Browse the repository at this point in the history
  2. update the name of experiment

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    6101489 View commit details
    Browse the repository at this point in the history
  3. add step of checking pod

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    d67a1b8 View commit details
    Browse the repository at this point in the history
  4. check the logs of pod

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    295abb6 View commit details
    Browse the repository at this point in the history
  5. add check

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    e0a1b6d View commit details
    Browse the repository at this point in the history
  6. check reason for imagepullbackoff

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    1df7df9 View commit details
    Browse the repository at this point in the history
  7. revert timeout limit

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    d1e1311 View commit details
    Browse the repository at this point in the history
  8. fix format

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    0cc319f View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. extend timeout limit

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    0383932 View commit details
    Browse the repository at this point in the history
  2. update training operator sdk version

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    08c8634 View commit details
    Browse the repository at this point in the history
  3. check the logs of pod

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    7a98a00 View commit details
    Browse the repository at this point in the history
  4. rerun tests

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    8862d79 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2024

  1. update the function of getting logs

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 14, 2024
    Configuration menu
    Copy the full SHA
    e4f614d View commit details
    Browse the repository at this point in the history
  2. add the step of describing pod

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 14, 2024
    Configuration menu
    Copy the full SHA
    0385eea View commit details
    Browse the repository at this point in the history
  3. check disk space

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 14, 2024
    Configuration menu
    Copy the full SHA
    e0c5170 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2024

  1. change work directory

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    0286f70 View commit details
    Browse the repository at this point in the history
  2. change work directory

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    f6e5ed5 View commit details
    Browse the repository at this point in the history
  3. increase timeout limit

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    7ea7e43 View commit details
    Browse the repository at this point in the history
  4. check the logs of controller and events

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    25d99b1 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2024

  1. change work directory

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    fcd64fa View commit details
    Browse the repository at this point in the history
  2. change work directory

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    122c611 View commit details
    Browse the repository at this point in the history
  3. change work directory

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    c1fde09 View commit details
    Browse the repository at this point in the history
  4. check the logs of kubelet

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    8ff6864 View commit details
    Browse the repository at this point in the history
  5. check the logs of kubelet

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    da3c298 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2024

  1. increase cpu

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 19, 2024
    Configuration menu
    Copy the full SHA
    a1bff26 View commit details
    Browse the repository at this point in the history
  2. check the logs of training operator

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 19, 2024
    Configuration menu
    Copy the full SHA
    bbae57b View commit details
    Browse the repository at this point in the history
  3. check the use of resources

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 19, 2024
    Configuration menu
    Copy the full SHA
    e45ceac View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2024

  1. check the logs of container 'pytorch' and 'storage_initializer'

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    4ae11ed View commit details
    Browse the repository at this point in the history
  2. fix error of checking use of resources

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    bedab36 View commit details
    Browse the repository at this point in the history
  3. add other checks to find the error reason

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    7bfb3cc View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2024

  1. set 'storage_config'

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    efffdc2 View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2024

  1. reduce the number of tests

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    2a18b17 View commit details
    Browse the repository at this point in the history
  2. Check container runtime logs

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    c6c964b View commit details
    Browse the repository at this point in the history
  3. set the driver of minikube as docker

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    28ffb96 View commit details
    Browse the repository at this point in the history
  4. set the driver of minikube to none

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    dc684e3 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. check logs of pod

    Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
    helenxie-bit committed Sep 24, 2024
    Configuration menu
    Copy the full SHA
    a12034c View commit details
    Browse the repository at this point in the history