Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross validation fails with error during training #97

Open
vineet-joshi opened this issue Jun 1, 2024 · 0 comments
Open

Cross validation fails with error during training #97

vineet-joshi opened this issue Jun 1, 2024 · 0 comments

Comments

@vineet-joshi
Copy link

Hello, this implementation does (should do) exactly what I need for a project I am working on.

However, I could not get the older versions of the torch+cuda and numpy modules to work on the the NVIDIA L4 GPU I am using for the project. I upgraded the torch version to 1.13.1 and the GPU has CUDA 12.4 installed. I also had to upgrade numpy version to 1.21.6, without which I get the following error -

  File "train.py", line 120, in main
    _main(args)
  File "train.py", line 114, in _main
    run(args)
  File "train.py", line 32, in run
    from svoice.solver import Solver
  File "/home/vineet/svoice/svoice/solver.py", line 23, in <module>
    from .evaluate import evaluate
  File "/home/vineet/svoice/svoice/evaluate.py", line 16, in <module>
    from pesq import pesq
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/pesq/__init__.py", line 6, in <module>
    from .cypesq import cypesq
  File "pesq/cypesq.pyx", line 1, in init cypesq
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

After updating these I was able to get the training script, train.py to start without interpreter errors, but the script fails during the cross validation step/process with the following error

[2024-06-01 16:03:45,776][__main__][INFO] - For logs, checkpoints and samples check /home/vineet/svoice/outputs/exp_
[2024-06-01 16:03:56,183][__main__][INFO] - Running on host training-l4-2-vcpus-24-ram-96-ubuntu
[2024-06-01 16:03:58,471][svoice.solver][DEBUG] - Checkpoint will be saved to /home/vineet/svoice/outputs/debug/model.th
[2024-06-01 16:03:58,472][svoice.solver][INFO] - ----------------------------------------------------------------------
[2024-06-01 16:03:58,472][svoice.solver][INFO] - Training...
[2024-06-01 16:03:59,818][svoice.solver][INFO] - Train | Epoch 1 | 3/15 | 3.5 it/sec | Loss 21.13142
[2024-06-01 16:04:00,384][svoice.solver][INFO] - Train | Epoch 1 | 6/15 | 4.1 it/sec | Loss 21.46726
[2024-06-01 16:04:00,954][svoice.solver][INFO] - Train | Epoch 1 | 9/15 | 4.4 it/sec | Loss 21.30898
[2024-06-01 16:04:01,521][svoice.solver][INFO] - Train | Epoch 1 | 12/15 | 4.6 it/sec | Loss 21.40352
[2024-06-01 16:04:02,067][svoice.solver][INFO] - Train | Epoch 1 | 15/15 | 4.7 it/sec | Loss 21.39990
[2024-06-01 16:04:02,070][svoice.solver][INFO] - Train Summary | End of Epoch 1 | Time 3.60s | Train Loss 21.39990
[2024-06-01 16:04:02,070][svoice.solver][INFO] - ----------------------------------------------------------------------
[2024-06-01 16:04:02,070][svoice.solver][INFO] - Cross validation...
[2024-06-01 16:04:02,330][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 120, in main
    _main(args)
  File "train.py", line 114, in _main
    run(args)
  File "train.py", line 95, in run
    solver.train()
  File "/home/vineet/svoice/svoice/solver.py", line 133, in train
    valid_loss = self._run_one_epoch(epoch, cross_valid=True)
  File "/home/vineet/svoice/svoice/solver.py", line 213, in _run_one_epoch
    estimate_source = self.dmodel(mixture)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/svoice/models/swave.py", line 256, in forward
    mixture_w = self.encoder(mixture)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/svoice/models/swave.py", line 284, in forward
    mixture_w = F.relu(self.conv(mixture))
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/vineet/svoice/.testing/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 310, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (8). Kernel size can't be greater than actual input size

After doing some searching, it appears that this could be a function of the training input .wav files. However, I am trying to use the training dataset provided with the repo, so would have thought that would be something that worked out of the box.

If I skip the cross validation step by setting the cross_valid parameter to False in the solver.py script, the training progresses but I encounter errors in the SWave model's Encoder's forward() method wherein the Conv1d() function fails. Also, I tried upgrading to Python 3.12, with corresponding updates to the dependencies, but run into the same issues.

When I skip steps, such as cross validation or get around the Conv1d() issues by providing default or empty tensors, I was able to get the training and evaluation to run, but the output speaker files have a monotone, continuous beeping sound overlayed on the speaker's voice, which I assume is a result of not performing cross validation or the convolution functions().

Any help in this regard is much appreciated. If I can get this implementation working, it is an ideal fit for a social project I am working on. Please let me know if you need additional information. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant