Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for receiving folder dataset paths as a list #1265

Merged

Conversation

harimkang
Copy link
Collaborator

@harimkang harimkang commented Aug 14, 2023

Description

# configs.yaml
dataset:
    name: model
    format: folder
    path: /app/data/trainData
    normal_dir: [folder1, folder2]
    normal_test_dir: [testfolder1, testfolder2]

# or we can also get the list format in the form below.
    normal_dir:
        - folder1
        - folder2

Changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which refactors the code base)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • My code follows the pre-commit style and check guidelines of this project.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

@harimkang
Copy link
Collaborator Author

Hi @djdameln, I want to write a unit test (and update documentation) for this, but I don't know what form it should take, can you give me some advice?

@github-actions github-actions bot added the Docs label Aug 14, 2023
@harimkang harimkang changed the title Support list config as path for folder dataset format Add support for receiving folder dataset paths as a list Aug 14, 2023
Copy link
Contributor

@samet-akcay samet-akcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @harimkang. I only have a single comment regarding the type of the multiple paths.

src/anomalib/data/folder.py Outdated Show resolved Hide resolved
@djdameln
Copy link
Contributor

Hi @djdameln, I want to write a unit test (and update documentation) for this, but I don't know what form it should take, can you give me some advice?

For the datamodule tests you could have a look at this file: https://github.com/openvinotoolkit/anomalib/blob/main/tests/pre_merge/datasets/test_datamodule.py, where we test the full functionality of the different datamodules using several fixtures. You could use the fixtures to write a test where you instantiate the Folder datamodule using a list of paths. The test would look something like this:

def test_folder_list_inputs(make_data_module):
    data_module = make_data_module("folder", abnormal_dir=["broken_large", "broken_small"])
    assert data_module

This would test if we can create a Folder datamodule using multiple abnormal directories instead of just one. Admittedly, creating the entire datamodule might add some unnecessary complexity, because we actually only need to test the make_folder_dataset function. We currently don't have unit tests specific for the make_dataset functions, but we could consider adding those, as it would allow us to test those functions directly and write much more specific tests.

So, alternatively you could add a file called test_make_dataset_functions under tests/pre_merge/datasets and add a unit test for your newly added functionality there. We could then later append that file with other specific tests for the make_dataset functions.

For the documentation, you could add a short paragraph to this guide: https://github.com/openvinotoolkit/anomalib/blob/main/docs/source/how_to_guides/train_custom_data.rst, explaining that the abnormal_dir parameter also takes lists of directories, and provide an example with the hazelnut toy dataset (e.g. abnormal_dir: [colour, crack]).

Returns:
ListConfig: The result of path replaced by ListConfig.
"""
if isinstance(path, ListConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow other data types here as well, since only supporting ListConfig is not very API-friendly. The following should probably work:

>>> from anomalib.data import Folder
>>> data_module = Folder(
...     root="datasets/hazelnut_toy/",
...     normal_dir="good",
...     abnormal_dir=["colour", "crack"],
...     image_size=256,
... )
>>> data_module.setup()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. I have fixed this. (to Sequence)

@github-actions github-actions bot added the Tests label Aug 16, 2023
@harimkang harimkang marked this pull request as ready for review August 16, 2023 05:36
Copy link
Contributor

@samet-akcay samet-akcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @harimkang! Much better! Only a single comment left from my side.

src/anomalib/data/folder.py Outdated Show resolved Hide resolved
src/anomalib/data/folder.py Outdated Show resolved Hide resolved
src/anomalib/data/folder.py Outdated Show resolved Hide resolved
Copy link
Contributor

@samet-akcay samet-akcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harimkang!

@ashwinvaidya17 ashwinvaidya17 merged commit 74b1c6a into openvinotoolkit:main Aug 17, 2023
3 of 4 checks passed
@ashwinvaidya17 ashwinvaidya17 mentioned this pull request Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants