Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

Closed
seyeon923 opened this issue Apr 9, 2024 · 3 comments · Fixed by #1966
Closed

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

seyeon923 opened this issue Apr 9, 2024 · 3 comments · Fixed by #1966

Comments

@seyeon923
Copy link
Contributor

Describe the bug

I've tried to run EfficientAD model my custom dataset with API like below code.

import anomalib

from anomalib.data import Folder
from anomalib.models import EfficientAd
from anomalib.engine import Engine
from anomalib.data.utils.split import TestSplitMode, ValSplitMode

if __name__ == "__main__":
    data_name = "MW24"
    normal_dir = "datasets/MW24_insp_img"
    abnormal_dir = "datasets/MW24_NG_insp_img"
    image_size = (256, 256)
    seed = 950923

    datamodule = Folder(
        name=data_name,
        normal_dir=normal_dir,
        abnormal_dir=abnormal_dir,
        image_size=image_size,
        task=anomalib.TaskType.CLASSIFICATION,
        test_split_mode=TestSplitMode.FROM_DIR,
        val_split_mode=ValSplitMode.SAME_AS_TEST,
        num_workers=0,
        seed=seed
    )
    model = EfficientAd()
    engine = Engine(task=anomalib.TaskType.CLASSIFICATION,
                    image_metrics=["F1Score", "AUROC", "Precision", "Recall"])

    engine.fit(datamodule=datamodule, model=model)

When I ran the above code, I've got an error FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'

When I checked the anomalib code, it seems that the path pretrained_teacher_EfficientAdModelSize.S.pth is calculated from anomalib.models.image.efficient_ad.lightning_model.EfficientAd.prepare_pretrained_model method.

    def prepare_pretrained_model(self) -> None:
        """Prepare the pretrained teacher model."""
        pretrained_models_dir = Path("./pre_trained/")
        if not (pretrained_models_dir / "efficientad_pretrained_weights").is_dir():
            download_and_extract(pretrained_models_dir, WEIGHTS_DOWNLOAD_INFO)
        teacher_path = (
            pretrained_models_dir / "efficientad_pretrained_weights" / f"pretrained_teacher_{self.model_size}.pth"
        )
        logger.info(f"Load pretrained teacher model from {teacher_path}")
        self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))

The not found pretrained weights path is same with teacher_path variable from the above method.
Because I used model size of EfficientAdModelSize.S, the self.model_size was EfficientAdModelSize.S enum with associated value of "small", and the interpolated path was pretrained_teacher_EfficientAdModelSize.S.pth.
But my local repo directory, the pretrained weigths had already been successfully downloaded with filename "pretrained_teacher_small.pth".

So, I've modified the tearcher_path part of the prepare_pretrained_model method like below.(self.model_size -> self.model_size.value)

teacher_path = (
            pretrained_models_dir / "efficientad_pretrained_weights" / f"pretrained_teacher_{self.model_size.value}.pth"
        )

And the training starts without error well.

So, I think it's good to fix the code, so that the downloaded pretrained weights' path and loading path to be identical.

Thank you

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

  1. Install anomalib lib by pip
  2. Install full packages for anomalib with command anomalib install
  3. run the code below with any images.(located at normal_dir, abnormal_dir)
import anomalib

from anomalib.data import Folder
from anomalib.models import EfficientAd
from anomalib.engine import Engine
from anomalib.data.utils.split import TestSplitMode, ValSplitMode

if __name__ == "__main__":
    data_name = "MW24"
    normal_dir = "datasets/MW24_insp_img"
    abnormal_dir = "datasets/MW24_NG_insp_img"
    image_size = (256, 256)
    seed = 950923

    datamodule = Folder(
        name=data_name,
        normal_dir=normal_dir,
        abnormal_dir=abnormal_dir,
        image_size=image_size,
        task=anomalib.TaskType.CLASSIFICATION,
        test_split_mode=TestSplitMode.FROM_DIR,
        val_split_mode=ValSplitMode.SAME_AS_TEST,
        num_workers=0,
        seed=seed
    )
    model = EfficientAd()
    engine = Engine(task=anomalib.TaskType.CLASSIFICATION,
                    image_metrics=["F1Score", "AUROC", "Precision", "Recall"])

    engine.fit(datamodule=datamodule, model=model)
  1. Error occurred with below messages.
Traceback (most recent call last)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\scripts\train_ad_model.py", line 34, in <module>
    engine.fit(datamodule=datamodule, model=model)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\engine\engine.py", line 518, in fit
    self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 989, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 198, in run
    self.on_run_start()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 324, in on_run_start
    call._call_lightning_module_hook(trainer, "on_train_start")
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 245, in on_train_start
    self.prepare_pretrained_model()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 99, in prepare_pretrained_model
    self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 986, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 435, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 416, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'

OS information

OS information:

  • OS: Windows10 22H2(19045.4170)
  • Python version: 3.11.8
  • Anomalib version: 1.0.1
  • PyTorch version: 2.1.2+cu121
  • CUDA/cuDNN version: 12.4
  • GPU models and configuration: 1x NVIDIA GeForce RTX 4070 Laptop
  • Any other relevant information: I'm using custom dataset. but it seems that the dataset is irrelevant.

Expected behavior

After modifying the code as I've mentioned above, the training process goes well.

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

I've used API without configuration file.

Logs

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\utilities.py:73: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
You are using a CUDA device ('NVIDIA GeForce RTX 4070 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
F1Score class exists for backwards compatibility. It will be removed in v1.1. Please use BinaryF1Score from torchmetrics instead
Incorrect constructor arguments for Precision metric from TorchMetrics package.
Incorrect constructor arguments for Recall metric from TorchMetrics package.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                  | Type                     | Params
-------------------------------------------------------------------
0 | model                 | EfficientAdModel         | 8.1 M
1 | _transform            | Compose                  | 0
2 | normalization_metrics | MinMax                   | 0
3 | image_threshold       | F1AdaptiveThreshold      | 0
4 | pixel_threshold       | F1AdaptiveThreshold      | 0
5 | image_metrics         | AnomalibMetricCollection | 0
6 | pixel_metrics         | AnomalibMetricCollection | 0
-------------------------------------------------------------------
8.1 M     Trainable params
0         Non-trainable params
8.1 M     Total params
32.235    Total estimated model params size (MB)
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
efficientad_pretrained_weights.zip: 40.0MB [00:08, 4.76MB/s]
Traceback (most recent call last)::  99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 39.7M/40.0M [00:08<00:00, 4.29MB/s]
  File "D:\repos\LGIT\ircf_pr_epoxy_test\scripts\train_ad_model.py", line 34, in <module>
    engine.fit(datamodule=datamodule, model=model)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\engine\engine.py", line 518, in fit
    self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 989, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 198, in run
    self.on_run_start()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 324, in on_run_start
    call._call_lightning_module_hook(trainer, "on_train_start")
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 245, in on_train_start
    self.prepare_pretrained_model()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 99, in prepare_pretrained_model
    self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 986, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 435, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 416, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'


### Code of Conduct

- [X] I agree to follow this project's Code of Conduct
@samet-akcay
Copy link
Contributor

@seyeon923 thanks for reporting this and your suggestion to fix the issue. Would you like to create a PR to become a contributor or would you prefer us to fix this?

@seyeon923
Copy link
Contributor Author

Ok, I'm going to create PR, and let you know after creating it. Thank you.

@seyeon923
Copy link
Contributor Author

@samet-akcay I've created PR about this. Could you check it?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants