Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_collections and ls-file-collection #343

Merged
merged 19 commits into from
May 11, 2023
Merged

iter_collections and ls-file-collection #343

merged 19 commits into from
May 11, 2023

Commits on May 9, 2023

  1. Establish iter_collections module, import iterdir() from gooey

    This is a start for consolidating common functionality scattered
    around various extensions into a single implementation (pattern).
    
    This changeset import `iterdir()` from `datalad-gooey`. In contrast to
    the original implementation, this new one is using a stricter approach
    to types, and overfits less to a dataset-aware use case.
    
    However, it is not meant to be the exclusive implementation, but merely
    a start and a place to migrate directory iterators into.
    
    Ping datalad#323
    mih committed May 9, 2023
    Configuration menu
    Copy the full SHA
    ae185e3 View commit details
    Browse the repository at this point in the history
  2. Extend type-annotation

    mih committed May 9, 2023
    Configuration menu
    Copy the full SHA
    521d751 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f2502b8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9e3064e View commit details
    Browse the repository at this point in the history
  5. Add support for TAR archive file collections

    This is trying to be structurally similar to the `directory` collection
    implementation.
    mih committed May 9, 2023
    Configuration menu
    Copy the full SHA
    6b070b1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    01c3d27 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    09632d6 View commit details
    Browse the repository at this point in the history
  8. Consolidate and deduplicate across collection iterators

    Only one path type enum, only one item dataclass (for now).
    mih committed May 9, 2023
    Configuration menu
    Copy the full SHA
    3ae477b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6568fc2 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    09a7513 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2023

  1. Test hashing in iterdir()

    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    059070a View commit details
    Browse the repository at this point in the history
  2. Standardize itertar() to always report (platform) PurePath

    Now also for link targets.
    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    3fa4db0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8832b9b View commit details
    Browse the repository at this point in the history
  4. Add tests for itertar()

    Using a pre-crafted (300 byte) tarball that is put on github, and is
    downloaded once per session.
    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    40c4f55 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6683ddb View commit details
    Browse the repository at this point in the history
  6. Clarify how joint validators must report violations

    Also include a check for these requirements that is executed in
    the error case (no performance critical), to inform developers
    about obvious implementation issues.
    
    Closes datalad#348
    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    b1655cc View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    fcd0d0c View commit details
    Browse the repository at this point in the history
  8. New ls-file-collection command

    This is (also) an alternative approach to `add-archive-content`.
    
    In comparison to the former, this is largely metadata driven, and
    works without (local) extraction of a tarball. This saves storage
    overhead, and makes it possible to run some parts of the ingestion
    pipeline on a remote system.
    
    Closes datalad#183
    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    5ed2248 View commit details
    Browse the repository at this point in the history
  9. Fix typo

    mih committed May 10, 2023
    Configuration menu
    Copy the full SHA
    4adba2c View commit details
    Browse the repository at this point in the history