Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate "traverser" from gooey and metalad into a common implementation #323

Closed
mih opened this issue Apr 19, 2023 · 5 comments · Fixed by #592
Closed

Consolidate "traverser" from gooey and metalad into a common implementation #323

mih opened this issue Apr 19, 2023 · 5 comments · Fixed by #592

Comments

@mih
Copy link
Member

mih commented Apr 19, 2023

This implementation should be layered, to be useful for listing, but also status/diff reporting.

It should also be benchmarked, relative to git-lsdir and and git-ls-tree.

Ping @christian-monch

@christian-monch

This comment was marked as outdated.

mih added a commit to mih/datalad-next that referenced this issue May 8, 2023
This is a start for consolidating common functionality scattered
around various extensions into a single implementation (pattern).

This changeset import `iterdir()` from `datalad-gooey`. In contrast to
the original implementation, this new one is using a stricter approach
to types, and overfits less to a dataset-aware use case.

However, it is not meant to be the exclusive implementation, but merely
a start and a place to migrate directory iterators into.

Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue May 8, 2023
This is a start for consolidating common functionality scattered
around various extensions into a single implementation (pattern).

This changeset import `iterdir()` from `datalad-gooey`. In contrast to
the original implementation, this new one is using a stricter approach
to types, and overfits less to a dataset-aware use case.

However, it is not meant to be the exclusive implementation, but merely
a start and a place to migrate directory iterators into.

Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue May 9, 2023
This is a start for consolidating common functionality scattered
around various extensions into a single implementation (pattern).

This changeset import `iterdir()` from `datalad-gooey`. In contrast to
the original implementation, this new one is using a stricter approach
to types, and overfits less to a dataset-aware use case.

However, it is not meant to be the exclusive implementation, but merely
a start and a place to migrate directory iterators into.

Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue May 9, 2023
This is a start for consolidating common functionality scattered
around various extensions into a single implementation (pattern).

This changeset import `iterdir()` from `datalad-gooey`. In contrast to
the original implementation, this new one is using a stricter approach
to types, and overfits less to a dataset-aware use case.

However, it is not meant to be the exclusive implementation, but merely
a start and a place to migrate directory iterators into.

Ping datalad#323
@mih
Copy link
Member Author

mih commented May 17, 2023

Looking into this further, it seems that it might be more meaningful to replace the implementation of change-detection in gooey's status-light which and iterator around git diff-index HEAD (instead of the present diff-files).

Worth taking a look at #91 in that context.

mih added a commit to mih/datalad-next that referenced this issue May 17, 2023
The iterator is also integrated with `ls-file-collection` as collection
type `gitworktree`.

Closes datalad#350
Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Jun 5, 2023
The iterator is also integrated with `ls-file-collection` as collection
type `gitworktree`.

Closes datalad#350
Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Jun 5, 2023
The iterator is also integrated with `ls-file-collection` as collection
type `gitworktree`.

Closes datalad#350
Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Jun 5, 2023
The iterator is also integrated with `ls-file-collection` as collection
type `gitworktree`.

Closes datalad#350
Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Jun 5, 2023
The iterator is also integrated with `ls-file-collection` as collection
type `gitworktree`.

Closes datalad#350
Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Dec 5, 2023
This adds support for a standard query that Gooey would be making.
Therefore, this is a significant step towards a resolution of datalad#323.

When interfaced with datalad#539, it replaces `gooey-lsdir` and half of
`gooey-status-light`, with a much more efficient/convenient
implementation.
mih added a commit to mih/datalad-next that referenced this issue Dec 5, 2023
This adds support for a standard query that Gooey would be making.
Therefore, this is a significant step towards a resolution of datalad#323.

When interfaced with datalad#539, it replaces `gooey-lsdir` and half of
`gooey-status-light`, with a much more efficient/convenient
implementation.
mih added a commit to mih/datalad-next that referenced this issue Dec 6, 2023
Minimal change, because we just pass it on to `iter_gitworktree()`.
Still added a smoke test.

This is now ready for use in Gooey.

Ping datalad#323
mih added a commit to mih/datalad-next that referenced this issue Dec 6, 2023
Minimal change, because we just pass it on to `iter_gitworktree()`.
Still added a smoke test.

This is now ready for use in Gooey.

Ping datalad#323
@mih
Copy link
Member Author

mih commented Dec 8, 2023

With iter_gitworktree() and iter_annexworktree() done, we only need to add one more essential iterator, and possibly a few more convenience helper.

The essential addition would be iter_gitdiff(), which does (type) change reporting based on git-diff-files|tree.

Such change reports could be merged into a true iter_gitstatus and, importantly, iter_annexstatus. The different to datalad status would be that instead of incurring the full cost of an exhaustive repository listing, the reporting is driven by an initial change report.

Even when iter_gitworktree() turns out to be the fastest way to list untacked content, a diff-index-based filter before annex query commands would allow for a substantial speed-up.

Sidenote: git ls-files --others --exclude-standard seems to be pretty fast.

@mih
Copy link
Member Author

mih commented Jan 5, 2024

#580 now also adds iter_gittree() to the list of available iterators.

@mih
Copy link
Member Author

mih commented Jan 6, 2024

I have implemented a draft of iter_gitdiff() now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants