Skip to content

Commit

Permalink
adjust to msrun command
Browse files Browse the repository at this point in the history
  • Loading branch information
WongGawa committed Sep 13, 2024
1 parent 935ccba commit 07fe3fe
Show file tree
Hide file tree
Showing 29 changed files with 85 additions and 90 deletions.
19 changes: 12 additions & 7 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,27 +48,32 @@ to understand their behavior. Some common arguments are:
```
</details>

* To train a model on 8 NPUs/GPUs:
```
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```

* To train a model on 1 NPU/GPU/CPU:
```
python train.py --config ./configs/yolov7/yolov7.yaml
```

* To train a model on 8 NPUs/GPUs:
```
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
* To evaluate a model's performance on 1 NPU/GPU/CPU:
```
python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt
```
* To evaluate a model's performance 8 NPUs/GPUs:
```
mpirun --allow-run-as-root -n 8 python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
```
*Notes: (1) The default hyper-parameter is used for 8-card training, and some parameters need to be adjusted in the case of a single card. (2) The default device is Ascend, and you can modify it by specifying 'device_target' as Ascend/GPU/CPU, as these are currently supported.*
* For more options, see `train/test.py -h`.

* Notice that if you are using `msrun` startup with 2 devices, please add `--bind_core=True` to improve performance. For example:
```
msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118 \
--log_dir=msrun_log --join=True --cluster_time_out=300 \
python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
> For more information, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/startup_method.html).
### Deployment

Expand Down
24 changes: 13 additions & 11 deletions GETTING_STARTED_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,33 +45,35 @@ python demo/predict.py --config ./configs/yolov7/yolov7.yaml --weight=/path_to_c
```
</details>

* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:

```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```

* 在单卡NPU/GPU/CPU上训练模型:

```shell
python train.py --config ./configs/yolov7/yolov7.yaml
```

* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:
```shell
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
* 在单卡NPU/GPU/CPU上评估模型的精度:

```shell
python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt
```
* 在多卡NPU/GPU上进行分布式评估模型的精度:

```shell
mpirun --allow-run-as-root -n 8 python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
```

*注意:默认超参为8卡训练,单卡情况需调整部分参数。 默认设备为Ascend,您可以指定'device_target'的值为Ascend/GPU/CPU。*
* 有关更多选项,请参阅 `train/test.py -h`.
* 在云脑上进行训练,请在[这里](./tutorials/cloud/modelarts_CN.md)查看

*注意:如果您在 2 个设备上使用`msrun`指令启动,请添加`--bind_core=True`以提高性能。例如:
```
msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118 \
--log_dir=msrun_log --join=True --cluster_time_out=300 \
python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
> 有关更多选项, 请参阅[这里](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/startup_method.html)。
### 部署
请在[这里](./deploy/README.md)查看.
Expand Down
6 changes: 3 additions & 3 deletions configs/yolov3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ python mindyolo/utils/convert_weight_darknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov3_log python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov4/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,11 @@ python mindyolo/utils/convert_weight_cspdarknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov4_log python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov5_log python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov7/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov8/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov8_log python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
8 changes: 3 additions & 5 deletions configs/yolox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolox_log python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
```

> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions docs/en/modelzoo/yolov3.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ python mindyolo/utils/convert_weight_darknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov3_log python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions docs/en/modelzoo/yolov4.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ python mindyolo/utils/convert_weight_cspdarknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov4_log python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions docs/en/modelzoo/yolov5.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ Please refer to the [QUICK START](../tutorials/quick_start.md) in MindYOLO for d
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov5_log python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions docs/en/modelzoo/yolov7.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ Please refer to the [QUICK START](../tutorials/quick_start.md) in MindYOLO for d
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions docs/en/modelzoo/yolov8.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ Please refer to the [QUICK START](../tutorials/quick_start.md) in MindYOLO for d
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov8_log python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
8 changes: 3 additions & 5 deletions docs/en/modelzoo/yolox.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,11 @@ Please refer to the [QUICK START](../tutorials/quick_start.md) in MindYOLO for d
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolox_log python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
```

> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
2 changes: 1 addition & 1 deletion docs/en/tutorials/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ __BASE__: [
This part of the parameters is usually passed in from the command line. Examples are as follows:

```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True --log_interval 50
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True --log_interval 50
```

## Dataset
Expand Down
2 changes: 1 addition & 1 deletion docs/en/tutorials/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ Since the SHWD training set only has about 6,000 images, the yolov7-tiny model w
* Distributed model training on multi-card NPU/GPU, taking 8 cards as an example:
```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7-tiny_log python train.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml --is_parallel True
```

* Train the model on a single card NPU/GPU/CPU:
Expand Down
Loading

0 comments on commit 07fe3fe

Please sign in to comment.