Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use --bucket parameter in colab? #4634

Closed
kuonumber opened this issue Sep 1, 2021 · 9 comments · Fixed by #4802
Closed

How to use --bucket parameter in colab? #4634

kuonumber opened this issue Sep 1, 2021 · 9 comments · Fixed by #4802
Labels
question Further information is requested

Comments

@kuonumber
Copy link

❔Question

I tried to save evlove.csv to gcs, and I thought I correctly set environment.
Could you give me some tips to get it done?
Thank you.

from google.colab import auth
auth.authenticate_user()
project_id = 'my-project'
!gcloud config set project {project_id}

Additional context

I got this erroe..
BadRequestException: 400 Invalid bucket name: 'gs:'

@kuonumber kuonumber added the question Further information is requested label Sep 1, 2021
@glenn-jocher
Copy link
Member

@kuonumber you can evolve from multiple machines in parallel using the --bucket argument:

python train.py --bucket BUCKET --evolve

BUCKET should be an open (public read/write permissions) bucket/directory string locatable on GCP, i.e. gs://bucket/dir/subidir

@kuonumber
Copy link
Author

@glenn-jocher
thanks a lot

@kuonumber
Copy link
Author

@glenn-jocher
After evolving, there had an empty weights folder in local machine.
Is it normal? Or should it provide best pt?

@glenn-jocher
Copy link
Member

@kuonumber evolution output is hyperparameters, not weights.

@rhysdg
Copy link

rhysdg commented Sep 15, 2021

Hey there! loving this functionality. I have a quick question however, is it possible to resume after an evo run? I've managed to succesfully store evolve.csv and hyp_evolve.yaml given the above command. However after trying to continue I receive a not a dirctory error with regards to the empty weights folder:

NotADirectoryError: [Errno 20] Not a directory: 'runs/evolve/exp4/weights'

For your reference I'm using the following to kick off hyperparameter evolution:

python train.py --img 640 --batch 64 --epochs 10 --data traffic.yaml --weights yolov5s.pt --evolve 10 --bucket yolo-evo/evolve/lisa/v5s

is it just a matter of emulating the original directory and creating an empty weights placeholder?

Cheers!

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 15, 2021

@rhysdg 👋 Hello! Thanks for asking about resuming evolution.

Resuming YOLOv5 🚀 evolution is a bit different than resuming a normal training run with python train.py --resume. If you started an evolution run which was interrupted, or finished normally, and you would like to continue for additional generations where you left off, then you pass --resume and specify the --name of the evolution you want to resume, i.e.:

Start Evolution

Assume you evolve YOLOv5s on COCO128 for 2 epochs for 3 generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 3

If this is your first evolution a new directory runs/evolve/exp will be created to save your results.

# ├── yolov5
#     └── runs
#         └── evolve
#             └── exp  ← evolution saved here

Start a Second Evolution

Now assume you want to start a completely separate evolution: YOLOv5s on VOC for 5 epochs for 3 generations. You simply start evolving, and your new evolution will again be logged to a new directory runs/evolve/exp2:

python train.py --epochs 5 --data VOC.yaml --weights yolov5s.pt --evolve 3

You will now have two evolution runs saved:

# ├── yolov5
#     └── runs
#         └── evolve
#             ├── exp  ← first evolution (COCO128)
#             └── exp2  ← second evolution (VOC)

Notebook example: Open In Colab Open In Kaggle
Screenshot 2021-09-15 at 12 23 13

Resume an Evolution

If you want to resume the first evolution (COCO128 saved to runs/evolve/exp), then you use the same exact command you started with plus --resume --name exp, passing the additional number of generations you want, i.e. --evolve 30 for 30 more generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 30 --resume --name exp

Evolution will run for an additional 30 generations and all new results will be added to the existing runs/evolve/exp/evolve.csv.

Good luck and let us know if you have any other questions!

@glenn-jocher glenn-jocher linked a pull request Sep 15, 2021 that will close this issue
@glenn-jocher
Copy link
Member

@rhysdg good news 😃! We fixed a small bug ✅ in resuming evolution in PR #4802. Following this PR resuming evolution should work correctly per the instructions in my previous post.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@rhysdg
Copy link

rhysdg commented Sep 22, 2021

Ah great @glenn-jocher thanks so much for the swift response! I can confirm that everything works well as per your instructions except a minor bug when resuming within a new instance that throws a file exists error. It's solved easily however by creating an empty directory corresponding to the previous session's name. So for --name v5x as an example !mkdir -p runs/evolve/v5x allows the session to continue without hassle after copying evolve.csv from a shared bucket. Cheers for everything and once again, absolutely loving working with Yolov5!

@glenn-jocher
Copy link
Member

@rhysdg hmm interesting. If it seems like a reproducible bug you might want to consider submitting a PR with a proposed fix to help others in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants