[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

seyeon923 · 2024-04-09T02:50:39Z

Describe the bug

I've tried to run EfficientAD model my custom dataset with API like below code.

import anomalib

from anomalib.data import Folder
from anomalib.models import EfficientAd
from anomalib.engine import Engine
from anomalib.data.utils.split import TestSplitMode, ValSplitMode

if __name__ == "__main__":
    data_name = "MW24"
    normal_dir = "datasets/MW24_insp_img"
    abnormal_dir = "datasets/MW24_NG_insp_img"
    image_size = (256, 256)
    seed = 950923

    datamodule = Folder(
        name=data_name,
        normal_dir=normal_dir,
        abnormal_dir=abnormal_dir,
        image_size=image_size,
        task=anomalib.TaskType.CLASSIFICATION,
        test_split_mode=TestSplitMode.FROM_DIR,
        val_split_mode=ValSplitMode.SAME_AS_TEST,
        num_workers=0,
        seed=seed
    )
    model = EfficientAd()
    engine = Engine(task=anomalib.TaskType.CLASSIFICATION,
                    image_metrics=["F1Score", "AUROC", "Precision", "Recall"])

    engine.fit(datamodule=datamodule, model=model)

When I ran the above code, I've got an error FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'

When I checked the anomalib code, it seems that the path pretrained_teacher_EfficientAdModelSize.S.pth is calculated from anomalib.models.image.efficient_ad.lightning_model.EfficientAd.prepare_pretrained_model method.

    def prepare_pretrained_model(self) -> None:
        """Prepare the pretrained teacher model."""
        pretrained_models_dir = Path("./pre_trained/")
        if not (pretrained_models_dir / "efficientad_pretrained_weights").is_dir():
            download_and_extract(pretrained_models_dir, WEIGHTS_DOWNLOAD_INFO)
        teacher_path = (
            pretrained_models_dir / "efficientad_pretrained_weights" / f"pretrained_teacher_{self.model_size}.pth"
        )
        logger.info(f"Load pretrained teacher model from {teacher_path}")
        self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))

The not found pretrained weights path is same with teacher_path variable from the above method.
Because I used model size of EfficientAdModelSize.S, the self.model_size was EfficientAdModelSize.S enum with associated value of "small", and the interpolated path was pretrained_teacher_EfficientAdModelSize.S.pth.
But my local repo directory, the pretrained weigths had already been successfully downloaded with filename "pretrained_teacher_small.pth".

So, I've modified the tearcher_path part of the prepare_pretrained_model method like below.(self.model_size -> self.model_size.value)

teacher_path = (
            pretrained_models_dir / "efficientad_pretrained_weights" / f"pretrained_teacher_{self.model_size.value}.pth"
        )

And the training starts without error well.

So, I think it's good to fix the code, so that the downloaded pretrained weights' path and loading path to be identical.

Thank you

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

Install anomalib lib by pip
Install full packages for anomalib with command anomalib install
run the code below with any images.(located at normal_dir, abnormal_dir)

import anomalib

from anomalib.data import Folder
from anomalib.models import EfficientAd
from anomalib.engine import Engine
from anomalib.data.utils.split import TestSplitMode, ValSplitMode

if __name__ == "__main__":
    data_name = "MW24"
    normal_dir = "datasets/MW24_insp_img"
    abnormal_dir = "datasets/MW24_NG_insp_img"
    image_size = (256, 256)
    seed = 950923

    datamodule = Folder(
        name=data_name,
        normal_dir=normal_dir,
        abnormal_dir=abnormal_dir,
        image_size=image_size,
        task=anomalib.TaskType.CLASSIFICATION,
        test_split_mode=TestSplitMode.FROM_DIR,
        val_split_mode=ValSplitMode.SAME_AS_TEST,
        num_workers=0,
        seed=seed
    )
    model = EfficientAd()
    engine = Engine(task=anomalib.TaskType.CLASSIFICATION,
                    image_metrics=["F1Score", "AUROC", "Precision", "Recall"])

    engine.fit(datamodule=datamodule, model=model)

Error occurred with below messages.

Traceback (most recent call last)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\scripts\train_ad_model.py", line 34, in <module>
    engine.fit(datamodule=datamodule, model=model)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\engine\engine.py", line 518, in fit
    self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 989, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 198, in run
    self.on_run_start()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 324, in on_run_start
    call._call_lightning_module_hook(trainer, "on_train_start")
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 245, in on_train_start
    self.prepare_pretrained_model()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 99, in prepare_pretrained_model
    self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 986, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 435, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 416, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'

OS information

OS information:

OS: Windows10 22H2(19045.4170)
Python version: 3.11.8
Anomalib version: 1.0.1
PyTorch version: 2.1.2+cu121
CUDA/cuDNN version: 12.4
GPU models and configuration: 1x NVIDIA GeForce RTX 4070 Laptop
Any other relevant information: I'm using custom dataset. but it seems that the dataset is irrelevant.

Expected behavior

After modifying the code as I've mentioned above, the training process goes well.

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

I've used API without configuration file.

Logs

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\utilities.py:73: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
You are using a CUDA device ('NVIDIA GeForce RTX 4070 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
F1Score class exists for backwards compatibility. It will be removed in v1.1. Please use BinaryF1Score from torchmetrics instead
Incorrect constructor arguments for Precision metric from TorchMetrics package.
Incorrect constructor arguments for Recall metric from TorchMetrics package.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                  | Type                     | Params
-------------------------------------------------------------------
0 | model                 | EfficientAdModel         | 8.1 M
1 | _transform            | Compose                  | 0
2 | normalization_metrics | MinMax                   | 0
3 | image_threshold       | F1AdaptiveThreshold      | 0
4 | pixel_threshold       | F1AdaptiveThreshold      | 0
5 | image_metrics         | AnomalibMetricCollection | 0
6 | pixel_metrics         | AnomalibMetricCollection | 0
-------------------------------------------------------------------
8.1 M     Trainable params
0         Non-trainable params
8.1 M     Total params
32.235    Total estimated model params size (MB)
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
efficientad_pretrained_weights.zip: 40.0MB [00:08, 4.76MB/s]
Traceback (most recent call last)::  99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 39.7M/40.0M [00:08<00:00, 4.29MB/s]
  File "D:\repos\LGIT\ircf_pr_epoxy_test\scripts\train_ad_model.py", line 34, in <module>
    engine.fit(datamodule=datamodule, model=model)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\engine\engine.py", line 518, in fit
    self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 989, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 198, in run
    self.on_run_start()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py", line 324, in on_run_start
    call._call_lightning_module_hook(trainer, "on_train_start")
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\lightning\pytorch\trainer\call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 245, in on_train_start
    self.prepare_pretrained_model()
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\anomalib\models\image\efficient_ad\lightning_model.py", line 99, in prepare_pretrained_model
    self.model.teacher.load_state_dict(torch.load(teacher_path, map_location=torch.device(self.device)))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 986, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 435, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos\LGIT\ircf_pr_epoxy_test\.venv\Lib\site-packages\torch\serialization.py", line 416, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'pre_trained\\efficientad_pretrained_weights\\pretrained_teacher_EfficientAdModelSize.S.pth'



### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

samet-akcay · 2024-04-09T03:00:24Z

@seyeon923 thanks for reporting this and your suggestion to fix the issue. Would you like to create a PR to become a contributor or would you prefer us to fix this?

seyeon923 · 2024-04-09T03:17:13Z

Ok, I'm going to create PR, and let you know after creating it. Thank you.

seyeon923 · 2024-04-09T05:30:41Z

@samet-akcay I've created PR about this. Could you check it?
Thank you.

seyeon923 mentioned this issue Apr 9, 2024

🐞 Fix EfficientAD's pretrained weigths load path #1966

Merged

9 tasks

samet-akcay closed this as completed in #1966 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

seyeon923 commented Apr 9, 2024

samet-akcay commented Apr 9, 2024

seyeon923 commented Apr 9, 2024

seyeon923 commented Apr 9, 2024

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

[Bug]: Using invalid pretrained weigths file path from EfficientAD #1965

Comments

seyeon923 commented Apr 9, 2024

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

samet-akcay commented Apr 9, 2024

seyeon923 commented Apr 9, 2024

seyeon923 commented Apr 9, 2024