Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dataset.example_media and cache Datacards #84

Merged
merged 14 commits into from
May 8, 2024
Merged

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Apr 25, 2024

Add caching for results (e.g. PNG files) generated by audbcards.Datacard.

To achieve this, I did the following:

  • Move audbcards.Datacard.example_media to audbcards.Dataset.example_media as it requires access to the dependency table (audbcards.Dataset.deps), and hence it is better suited to be cached as part of audbcards.Dataset.
  • Added cache_root argument to audbcards.Datacard. It uses per default the same folder as audbcards.Dataset.
  • Cache the resulting plot of audbcards.Datacard.file_duration_distribution() in the cache folder.
  • Cache the media file and resulting plot from audbcards.Datacard.player() in the cache folder.

The structure of the stored cache files is (shown by the example for emodb):

$ tree ~/.cache/audbcards/emodb
.../.cache/audbcards/emodb
└── 1.4.1
    ├── emodb-1.4.1-file-duration-distribution.png
    ├── emodb-1.4.1-player-media
    │   └── wav
    │       └── 13b09La.wav
    ├── emodb-1.4.1-player-waveform.png
    └── emodb-1.4.1.pkl

I again tested building the pages for all our datasets and now get:

branch fresh build build from cache
main 15 minutes 2 minutes
this branch 15 minutes 30 seconds

whereas now most of the time is spend on compiling the HTML pages, and not on gathering information about the datasets.


Updated docstrings:

image

image

image

image

@hagenw hagenw marked this pull request as draft April 25, 2024 12:53
@hagenw hagenw changed the title Add Dataset.example_media and caching for Datacard Add Dataset.example_media and cache Datacards Apr 30, 2024
@hagenw hagenw marked this pull request as ready for review April 30, 2024 13:21
@hagenw hagenw mentioned this pull request May 2, 2024
Copy link
Member

@ChristianGeng ChristianGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The Merge Request is defintively useful.
I have added some comments on code organisation for potential readability improvements.

One topic that I can perceive is versioning:

Python Versions

currently only python version 3.9 and 3.10 are supported.
I have tested the new test in tests/test_dataset.py::test_dataset_example_media against new venvs with these python versions: 3.9, 3.11 and 3.12 (3.10 is broken on my system currently). The test works under 3.9 and 3.11 but under 3.12 I get this error:

E   AttributeError: module 'pathlib' has no attribute '_Flavour'

Probably something to do with audeer?

I have also run all tests against 3.11. What would speak against adding it in pyproject.toml

audb and audbackend versions

I assume forthcoming upgrades to 2.x.x will be dealt with in a follow up issue - but I could not spot it in the backlog.

@hagenw
Copy link
Member Author

hagenw commented May 8, 2024

The error you see with Python 3.12 is related to caused by devopshq/artifactory#430. Until that is solved all our packages using Artifactory backends are blocked for Python 3.12.

@hagenw
Copy link
Member Author

hagenw commented May 8, 2024

For audbackend 2.0.0, I prepared already #90, but I wanted to first get this here merged, so I can rebase my changes for audbbackend 2.0.0, before assigning it for review.

@hagenw hagenw mentioned this pull request May 8, 2024
@ChristianGeng
Copy link
Member

For me everything is addressed. I do not know how to later approve in github. I would if I did though. Anyway I am happy now and this can imo be merged.

@hagenw
Copy link
Member Author

hagenw commented May 8, 2024

Thanks. I also don't know how to later approve. I guess it is implicitly assumed that it is approved when all discussions are resolved. Otherwise, I would have to re-request a review, and then you could approve.

@hagenw hagenw merged commit 6904bb8 into main May 8, 2024
6 checks passed
@hagenw hagenw deleted the datacard-caching branch May 8, 2024 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants