-
-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use --bucket parameter in colab? #4634
Comments
@kuonumber you can evolve from multiple machines in parallel using the --bucket argument:
BUCKET should be an open (public read/write permissions) bucket/directory string locatable on GCP, i.e. gs://bucket/dir/subidir |
@glenn-jocher |
@glenn-jocher |
@kuonumber evolution output is hyperparameters, not weights. |
Hey there! loving this functionality. I have a quick question however, is it possible to resume after an evo run? I've managed to succesfully store evolve.csv and hyp_evolve.yaml given the above command. However after trying to continue I receive a not a dirctory error with regards to the empty weights folder:
For your reference I'm using the following to kick off hyperparameter evolution:
is it just a matter of emulating the original directory and creating an empty weights placeholder? Cheers! |
@rhysdg 👋 Hello! Thanks for asking about resuming evolution. Resuming YOLOv5 🚀 evolution is a bit different than resuming a normal training run with Start EvolutionAssume you evolve YOLOv5s on COCO128 for 2 epochs for 3 generations: python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 3 If this is your first evolution a new directory # ├── yolov5
# └── runs
# └── evolve
# └── exp ← evolution saved here Start a Second EvolutionNow assume you want to start a completely separate evolution: YOLOv5s on VOC for 5 epochs for 3 generations. You simply start evolving, and your new evolution will again be logged to a new directory python train.py --epochs 5 --data VOC.yaml --weights yolov5s.pt --evolve 3 You will now have two evolution runs saved: # ├── yolov5
# └── runs
# └── evolve
# ├── exp ← first evolution (COCO128)
# └── exp2 ← second evolution (VOC) Resume an EvolutionIf you want to resume the first evolution (COCO128 saved to python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 30 --resume --name exp Evolution will run for an additional 30 generations and all new results will be added to the existing Good luck and let us know if you have any other questions! |
@rhysdg good news 😃! We fixed a small bug ✅ in resuming evolution in PR #4802. Following this PR resuming evolution should work correctly per the instructions in my previous post. To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
Ah great @glenn-jocher thanks so much for the swift response! I can confirm that everything works well as per your instructions except a minor bug when resuming within a new instance that throws a file exists error. It's solved easily however by creating an empty directory corresponding to the previous session's name. So for |
@rhysdg hmm interesting. If it seems like a reproducible bug you might want to consider submitting a PR with a proposed fix to help others in the future. |
❔Question
I tried to save evlove.csv to gcs, and I thought I correctly set environment.
Could you give me some tips to get it done?
Thank you.
Additional context
I got this erroe..
BadRequestException: 400 Invalid bucket name: 'gs:'
The text was updated successfully, but these errors were encountered: