pretraining error reporting in APPLE M3 pro device #446

zh-zhang1984 · 2024-08-14T02:37:31Z

When I am following the instruction to pretrain the model, I get the following error reporting:
I am using APPLE M3 pro device; How may I solve this issue in the example:

ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
E0814 10:20:55.577000 8228223680 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 6809) of binary: /opt/anaconda3/bin/python3
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 198, in <module>
    main()
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 194, in main
    launch(args)
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 179, in launch
    run(args)
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
../../train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-14_10:20:55
  host      : 1.0.0.127.in-addr.arpa
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 6809)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretraining error reporting in APPLE M3 pro device #446

pretraining error reporting in APPLE M3 pro device #446

zh-zhang1984 commented Aug 14, 2024

pretraining error reporting in APPLE M3 pro device #446

pretraining error reporting in APPLE M3 pro device #446

Comments

zh-zhang1984 commented Aug 14, 2024