Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EfficientAD #1143

Merged
merged 5 commits into from
Jun 30, 2023
Merged

Update EfficientAD #1143

merged 5 commits into from
Jun 30, 2023

Conversation

nelson1425
Copy link
Contributor

Description

I have run EfficientAD-S on the current main branch. It reached 0.972 Image AUROC on average on MVTec AD. The paper's result is 0.988. I have looked at the paper hyperparameters and have updated the settings in anomalib.

Now EfficientAD-S scores 0.982 Image AUROC on average on MVTec AD. These are the changes in anomalib:

  • Fix a bug in the quantile computation that happened on MVTec screw and hazelnut because the input of torch.quantile was too large.
  • Compute anomaly map quantiles on the validation loader instead of on the training loader.
  • Use Adam instead of AdamW. The authors do not cite AdamW, so I think they used Adam.
  • Disable padding, as mentioned by the authors in the appendix.
  • Add max_steps = 70000. The authors say in the appendix that they always train for 70000 iterations. In anomalib, we have 200 epochs. I have added max_steps = 70000 to prevent overfitting on large datasets and also to avoid very long trainings on large datasets with more than 1000 training images for example.

I changed the default to EfficientAD-S, because it is faster to train and performs a little bit better than EfficientAD-M in anomalib.

The diff of the markdown files is caused by the pre-commit markdown formatter.

Changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which refactors the code base)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • My code follows the pre-commit style and check guidelines of this project.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works (see results on MVTec AD)
  • New and existing tests pass locally with my changes
  • I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

@blaz-r
Copy link
Contributor

blaz-r commented Jun 18, 2023

Hello @nelson1425 and thanks for this.
I'm not familiar enough with the code to tell with certainty if everything is as it should be, but it's good to see this improvement.
I did notice two things though:

  • you are using validation dataloder to compute quantiles, but config file specifies that val dataloader is same as test, since we have val_split_mode: same_as_test in config.yaml. This might produce skewed results, so I think that this should be changed to val_split_mode: from_test. @samet-akcay can you confirm this?
  • You are correct about Adam optimizer, in appendix of EfficientAD they do specify that they are using Adam not AdamW.

I've seen that you also have your own implementation, which works a bit better, especially the M sized model. Do you have any more ideas where this discrepancy comes from in Anomalib?

Thanks again for all this, and I hope that the above can be resolved as well as other changes verified by other contributors.

@nelson1425
Copy link
Contributor Author

nelson1425 commented Jun 18, 2023

@blaz-r Hello and thank you for your comments.

Regarding the validation set: yes, this is an interesting question. Most anomalib methods have val_split_mode: same_as_test. This is not perfectly clean because it allows tuning on the test set. For example, CS-Flow does early stopping on the validation set and thus on the test set, which means it will tune the number of epochs on the test set. For the map normalization of EfficientAD, only non-defect images are required. In this implementation, I used non-defect images from a validation set that was split from the training set. Here for anomalib, the non-defect images from the test set are used for simplicity, because this option is available with the val_split_mode: same_as_test. In practice, the difference is small, because there are only four numbers computed on the validation set of non-defect images. You mentioned val_split_mode: from_test. This cannot be used because it would reduce the test set. This means that scores computed on MVTec AD would not be based on the official test set but a subset of the test set.

@alexriedel1
Copy link
Contributor

Hello @nelson1425 and thanks for this. I'm not familiar enough with the code to tell with certainty if everything is as it should be, but it's good to see this improvement. I did notice two things though:

  • you are using validation dataloder to compute quantiles, but config file specifies that val dataloader is same as test, since we have val_split_mode: same_as_test in config.yaml. This might produce skewed results, so I think that this should be changed to val_split_mode: from_test. @samet-akcay can you confirm this?
  • You are correct about Adam optimizer, in appendix of EfficientAD they do specify that they are using Adam not AdamW.

I've seen that you also have your own implementation, which works a bit better, especially the M sized model. Do you have any more ideas where this discrepancy comes from in Anomalib?

Thanks again for all this, and I hope that the above can be resolved as well as other changes verified by other contributors.

The reason why the other github implementations of EfficientAD cannot be used as they are in anomlib is, that anomalib uses albumentations for all the image transformation like resizing, cropping, gray. The Github implementations however use torchvision. Both have slightly different algorithms in their image transformation function that reduce the accuracy if you train a teacher model on torchvision transforms but then use albumentations transforms. Also, the pre-trained ResNet101 that is used to teach the teacher is trained on torchvision transforms which causes additional differences.

I have trained the teacher model using albumentations but still cannot reach the paper accuracy in some classes.

# we do this by sampling random elements of maps_flat because then
# the locations of the quantiles (90% and 99.5%) will still be
# valid even though they might not be the exact quantiles.
max_input_size = 16777216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you get that number? Is it an arbitrary choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 16 * 1024 * 1024 and is the actual limit of what torch.quantile can handle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe you can use 2**24 here and reference the line in pytorch that causes the limit https://github.com/pytorch/pytorch/blob/b9f81a483a7879cd3709fd26bcec5f1ee33577e6/aten/src/ATen/native/Sorting.cpp#L291

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you do this check for max input size for quantile and random sample here and not in the forward method of the torch_model.py ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the normalization parameters are calculated before the actual training

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe you can use 2**24 here and reference the line in pytorch that causes the limit

Done

@@ -308,6 +308,8 @@ def forward(self, batch: Tensor, batch_imagenet: Tensor = None) -> Tensor | dict
(ae_output - student_output[:, self.teacher_out_channels :]) ** 2, dim=1, keepdim=True
)

map_st = F.pad(map_st, (4, 4, 4, 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this line in the paper..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this padding the segmentation metric will decrease very much because the segmentation pixels are not aligned anymore after the next line (map_st = F.interpolate(map_st, size=(self.input_size[0], self.input_size[1]), mode="bilinear")).

It is similar to applying 11x11 edge filter to 100x100 image. The result will be 90x90 image with detected edges. Resizing it to 100x100 would distort the detected edges. Instead you have to pad the 90x90 image with 5 pixels on each size.

In the case of EfficientAD, the patch size is 33, so 16 pixels padding on each border. The teacher has two layers with stride 2, so we need to divide the 16 pixels padding by 2 and again by 2, which gives us 4. So 4 pixels padding on each border for the output feature maps.

@alexriedel1
Copy link
Contributor

At least your new results on the bottle class however do not match my results on the bottle class with the medium model and the previous version: https://api.wandb.ai/links/alexried/mnft4f0b

Regarding (convolutional) padding, thats what the authors say

In latency-critical applications, padding in the PDN architecture of EfficientAD can be disabled. This speeds
up the forward pass of the PDN architecture by 80 µs without impairing the detection of anomalies. We time EfficientAD
without padding and therefore report the anomaly detection results for this setting in the experimental results of this paper. We
perform 1000 forward passes as warm up and report the mean runtime of the following 1000 forward passes. 

I guess they use padding for their performance measures but remove it for measuring timing..

@nelson1425
Copy link
Contributor Author

@alexriedel1

I guess they use padding for their performance measures but remove it for measuring timing..

The authors say: "We time EfficientAD without padding and therefore report the anomaly detection results for this setting in the experimental results of this paper."

So I think they use the "without padding" setting for the experimental results of the paper. In my experiments, without padding also works slightly better than with padding.

@nelson1425
Copy link
Contributor Author

@samet-akcay @ashwinvaidya17 @djdameln can you please review the changes?

@alexriedel1
Copy link
Contributor

@alexriedel1

I guess they use padding for their performance measures but remove it for measuring timing..

The authors say: "We time EfficientAD without padding and therefore report the anomaly detection results for this setting in the experimental results of this paper."

So I think they use the "without padding" setting for the experimental results of the paper. In my experiments, without padding also works slightly better than with padding.

Ok alright! Did you train your teacher models with or without padding?

Co-authored-by: Declan <dkutscher10@gmail.com>
Co-authored-by: Declan <dkutscher10@gmail.com>
@alexriedel1
Copy link
Contributor

@nelson1425 I have one more question, do you report your accuracy results for the last training epoch or for the best training epoch?

@alexriedel1
Copy link
Contributor

I have realized another thing regarding the pretrained teachers. Unfortunately I cannot test on all the MVTec categories, but in some of them, the pretrained model by rximg https://github.com/rximg/EfficientAD/tree/main perform better than my models that are currently provided.

Maybe you can get results using these models? You just have to replace the models in anomalib\pre_trained\efficientad_pretrained_weights with these models and change the names accordingly: https://drive.google.com/drive/folders/1pYHUaOHdw0Q00rYWbVD4e8aEEb07yoZg?usp=sharing

@djdameln
Copy link
Contributor

@blaz-r @nelson1425

I fully agree that using same_as_test for the validation split mode is not the best way to obtain the validation set. The reason why we use this as default setting is reproducibility. We aim to reproduce the performance numbers reported in the original publications of the anomalib models, and these are all based on the full test set. As @nelson1425 pointed out, using the from_test option to obtain the validation set would reduce the test set and we would not be able to compare the model's performance to other implementations and other models.

Ultimately, the problem here is that the MVTec dataset does not have a pre-supplied validation set. For the current problem, the best solution would be to sub-sample the normal images from the training set and use those to compute the quantiles. However, there are some technical issues which prevent us from adding a from_train option to val_split_mode parameter. Most importantly, this would lead to other problems downstream, because the validation set is also used to compute the adaptive threshold value, and for this it is required to contain anomalous samples.

For now, I suggest we go with @nelson1425's proposed solution.

@djdameln
Copy link
Contributor

The reason why the other github implementations of EfficientAD cannot be used as they are in anomlib is, that anomalib uses albumentations for all the image transformation like resizing, cropping, gray. The Github implementations however use torchvision. Both have slightly different algorithms in their image transformation function that reduce the accuracy if you train a teacher model on torchvision transforms but then use albumentations transforms. Also, the pre-trained ResNet101 that is used to teach the teacher is trained on torchvision transforms which causes additional differences.

@alexriedel1 Do you think the performance of the anomalib model would improve if we switch to Torchvision transforms? We are considering to replace albumentations with torchvision, mainly for convenience reasons. Improved model results would be another reason to prioritize this effort.

@alexriedel1
Copy link
Contributor

The reason why the other github implementations of EfficientAD cannot be used as they are in anomlib is, that anomalib uses albumentations for all the image transformation like resizing, cropping, gray. The Github implementations however use torchvision. Both have slightly different algorithms in their image transformation function that reduce the accuracy if you train a teacher model on torchvision transforms but then use albumentations transforms. Also, the pre-trained ResNet101 that is used to teach the teacher is trained on torchvision transforms which causes additional differences.

@alexriedel1 Do you think the performance of the anomalib model would improve if we switch to Torchvision transforms? We are considering to replace albumentations with torchvision, mainly for convenience reasons. Improved model results would be another reason to prioritize this effort.

I guess that would make sense, as both timm and torchvision pre-trained models are trained using torchvision transforms. For feature extracting anomaly detection methods using torchvision transforms would bring a performance boost. (Even for the imagenet normalization albumentation and torchvision show slightly different results...)

Copy link
Contributor

@djdameln djdameln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm happy with the changes, though I'm interested in this as well:

@nelson1425 I have one more question, do you report your accuracy results for the last training epoch or for the best training epoch?

Copy link
Collaborator

@ashwinvaidya17 ashwinvaidya17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the efforts! Looks good to me

@samet-akcay
Copy link
Contributor

@alexriedel1 Do you think the performance of the anomalib model would improve if we switch to Torchvision transforms? We are considering to replace albumentations with torchvision, mainly for convenience reasons. Improved model results would be another reason to prioritize this effort.

I guess that would make sense, as both timm and torchvision pre-trained models are trained using torchvision transforms. For feature extracting anomaly detection methods using torchvision transforms would bring a performance boost. (Even for the imagenet normalization albumentation and torchvision show slightly different results...)

@alexriedel1, so you are in favour of using torchvision transforms as well? As @djdameln pointed out, we would like to switch to torchvision since it now has the functionality that we want. I just want to hear what people woult think about this change?

@alexriedel1
Copy link
Contributor

@alexriedel1 Do you think the performance of the anomalib model would improve if we switch to Torchvision transforms? We are considering to replace albumentations with torchvision, mainly for convenience reasons. Improved model results would be another reason to prioritize this effort.

I guess that would make sense, as both timm and torchvision pre-trained models are trained using torchvision transforms. For feature extracting anomaly detection methods using torchvision transforms would bring a performance boost. (Even for the imagenet normalization albumentation and torchvision show slightly different results...)

@alexriedel1, so you are in favour of using torchvision transforms as well? As @djdameln pointed out, we would like to switch to torchvision since it now has the functionality that we want. I just want to hear what people woult think about this change?

yes right, with their support for bounding boxes and segmentations masks I would also be happy to see the switch to torchvision :)

@nelson1425
Copy link
Contributor Author

@nelson1425 I have one more question, do you report your accuracy results for the last training epoch or for the best training epoch?

I report them for the last training epoch

@nelson1425
Copy link
Contributor Author

@djdameln Thanks, I'm happy with the changes, though I'm interested in this as well: do you report your accuracy results for the last training epoch or for the best training epoch?

I report them for the last training epoch, so after the training and the map normalization are finished.

I have added one commit. From my side, the pull request can be merged now.

@alexriedel1
Copy link
Contributor

It would be great if you could add an option to turn on and off anomaly map padding in the config (that is turned on by default if you want). For objects that reach to the border i get this padding effect on the anomaly maps with the padding
image

@nelson1425
Copy link
Contributor Author

It would be great if you could add an option to turn on and off anomaly map padding in the config (that is turned on by default if you want)

Done. Sorry for the delay!

@samet-akcay I think your approve is required before the pull request can be merged, correct?

@samet-akcay
Copy link
Contributor

@ashwinvaidya17 and @djdameln already approved it, so we could merge it after the tests.

@samet-akcay
Copy link
Contributor

@d3tk, there are torch and lightning model implementations. EfficientADModel is for the torch implementation, which is for those who would like to use torch-only version during the inference.

__init__.py only stores the lightning imports. Do you think whether the torch implementations should be there as well?

@d3tk
Copy link
Contributor

d3tk commented Jun 28, 2023

@samet-akcay I believe I had made a mistake applying the newest changes and so I got import errors. It works for me now once I corrected that. Apologies!

@nelson1425
Copy link
Contributor Author

@samet-akcay The check failed with "FAIL Required test coverage of 70% not reached. Total coverage: 61.03%"

@nelson1425
Copy link
Contributor Author

@samet-akcay how should we proceed?

@samet-akcay
Copy link
Contributor

samet-akcay commented Jun 30, 2023

@ashwinvaidya17, thoughs about the coverage? Should we reduce the coverage to 60 for now, and address later on?

@blaz-r
Copy link
Contributor

blaz-r commented Jun 30, 2023

I think that EfficientAD could also be added to pre_merge tests for models. I'm not that familiar with entire test infrastructure, but I do believe that would increase coverage?

@samet-akcay
Copy link
Contributor

oh wait, I think the failure was due to some other CI problem. the coverage is 73% now, and the tests are passing. I'm happy to merge the PR

@samet-akcay
Copy link
Contributor

samet-akcay commented Jun 30, 2023

I think that EfficientAD could also be added to pre_merge tests for models. I'm not that familiar with entire test infrastructure, but I do believe that would increase coverage?

@blaz-r, yeah, you are right. It's just a one liner. My concern though is that we need to ensure that we quickly run it for a single forward-pass/iteration or epoch for a quick integration test. Otherwise, the model will increase the run time.

We are working on changing the entire test infrastructure as well to address these :)

@samet-akcay
Copy link
Contributor

@nelson1425, thanks a lot for your effort! Much appreciated!

@samet-akcay samet-akcay merged commit 22ab4e1 into openvinotoolkit:main Jun 30, 2023
2 checks passed
@ashwinvaidya17 ashwinvaidya17 mentioned this pull request Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants