Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change semantics of PathBasedItem names #583

Merged
merged 1 commit into from
Jan 8, 2024
Merged

Change semantics of PathBasedItem names #583

merged 1 commit into from
Jan 8, 2024

Conversation

mih
Copy link
Member

@mih mih commented Jan 5, 2024

Previously, the name attribute was some kind of PurePath instance. The precise nature varied across item types and corresponding iterators.

This led to two problems:

  • Performance (see Performance impact of PathBasedItem #581): the creation of a Path instance is not particularly cheap. If a consumer does not need such an instance, and unconditional creation is a waste of resources
  • Abstraction (see Fix directory name handling in zip-iterator #430): the Path representation of a path-like name does not always match the exact semantics of the original name (it might have a trailing slash that has a purpose, etc)

Both problems are now addressed by relaxing the type of .name of a PathBasedItem. It can now be anything. Consumers that require an actual Path instance can use the .path attribute. An analog access is also implemented for .link_target (which now remains literal), that is accompanied by .link_target_path, where it was necessary.

This change in approach removed the need for the _ZipFileDirPath workaround that was used to re-establish compatibility of a Path-based .name with the conventions in a ZipFile collection/container. With the .name attribute retaining its native semantics, the workaround is no longer needed as was removed.

In order to document the (lack of) implemented homogeneous conventions for path-based items, the docstrings of all iterators have been amended with information on the nature of the .name attribute yielded by them. The corresponding data classes have received docstrings for the newly added properties for Path access.

These properties uniformly use the cached_property decorator. For the lifetime of an item, the nature of such a path should never change, and caching it automatically addresses the significant creation cost on accessing a path representation repeatedly.

Closes #554
Closes #581

Previously, the `name` attribute was some kind of `PurePath` instance.
The precise nature varied across item types and corresponding
iterators.

This led to two problems:

- Performance (see datalad#581): the creation of a Path instance is not
  particularly cheap. If a consumer does not need such an instance,
  and unconditional creation is a waste of resources
- Abstraction (see datalad#430): the Path representation of a path-like
  name does not always match the exact semantics of the original
  name (it might have a trailing slash that has a purpose, etc)

Both problems are now addressed by relaxing the type of `.name`
of a `PathBasedItem`. It can now be anything. Consumers that require
an actual Path instance can use the `.path` attribute. An analog
access is also implemented for `.link_target` (which now remains
literal), that is accompanied by `.link_target_path`, where it
was necessary.

This change in approach removed the need for the `_ZipFileDirPath`
workaround that was used to re-establish compatibility of a Path-based
`.name` with the conventions in a `ZipFile` collection/container.
With the `.name` attribute retaining its native semantics, the
workaround is no longer needed as was removed.

In order to document the (lack of) implemented homogeneous conventions
for path-based items, the docstrings of all iterators have been amended
with information on the nature of the `.name` attribute yielded by them.
The corresponding data classes have received docstrings for the newly
added properties for Path access.

These properties uniformly use the `cached_property` decorator.
For the lifetime of an item, the nature of such a path should never
change, and caching it automatically addresses the significant
creation cost on accessing a path representation repeatedly.

Closes datalad#554
Closes datalad#581
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (a27a4eb) 92.45% compared to head (4b9c4e2) 92.43%.

Files Patch % Lines
datalad_next/iter_collections/tarfile.py 77.77% 2 Missing ⚠️
datalad_next/iter_collections/utils.py 77.77% 2 Missing ⚠️
datalad_next/iter_collections/zipfile.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #583      +/-   ##
==========================================
- Coverage   92.45%   92.43%   -0.02%     
==========================================
  Files         149      149              
  Lines       10775    10779       +4     
  Branches     1618     1617       -1     
==========================================
+ Hits         9962     9964       +2     
- Misses        636      638       +2     
  Partials      177      177              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mih
Copy link
Member Author

mih commented Jan 8, 2024

Ok, let's go without review

@mih mih merged commit 97b373c into datalad:main Jan 8, 2024
5 of 8 checks passed
@mih mih deleted the bf-581 branch January 8, 2024 19:01
@mih mih added this to the 1.1 milestone Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant