-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iter_gitworktree()
for processing work tree content
#361
Conversation
Great work. I played with it a bit, and it was nice and smooth. A couple of comments and questions:
datalad ls-file-collection gitworktree .
|
re limited options: Right now it is too early IMHO to define a sophisticated interface for this command. Both necessary and desirable functionality are undetermined still. I believe we need more iterators and use cases to get a better idea first. re result renderer: Absolutely possible, but no priority. Also, run the same command with Long term, we could have some decision making on any incoming result to select the most appropriate renderer. However, any real-world use of this command that I can see would use the json renderer anyways. So this is more about eye candy than functionality IMHO. |
Thanks, that all seems sensible. |
Originally introduced in datalad#361, but a need for this has also been expressed elsewhere, hence now a standalone changeset.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #361 +/- ##
==========================================
- Coverage 92.51% 92.45% -0.07%
==========================================
Files 117 119 +2
Lines 8468 8572 +104
==========================================
+ Hits 7834 7925 +91
- Misses 634 647 +13
☔ View full report in Codecov by Sentry. |
32c5810
to
0b16d98
Compare
The iterator is also integrated with `ls-file-collection` as collection type `gitworktree`. Closes datalad#350 Ping datalad#323
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a confirmation that it works for me than a review - here's my attempt on a worktree glob by file name extension:
pprint(
[
x
for x in iter_gitworktree(ds.pathobj, untracked=None, link_target=True)
if x.name.suffix == ".dat"
]
)
gives:
[GitWorktreeFileSystemItem(type=<FileSystemItemType.symlink: 'symlink'>, name=PurePosixPath('bar/file2.dat'), size=119, mtime=1685962565.9671404, mode=41471, uid=1001, gid=1001, link_target=PurePosixPath('../.git/annex/objects/FZ/8w/MD5E-s6--6f41758e57d7958213d129349ac20c8e.dat/MD5E-s6--6f41758e57d7958213d129349ac20c8e.dat'), fp=None, gitsha='ad2f7c351c5afd14cc29a6150752fa5e898d85f1', gittype=<GitTreeItemType.symlink: 'symlink'>),
GitWorktreeFileSystemItem(type=<FileSystemItemType.symlink: 'symlink'>, name=PurePosixPath('file1.dat'), size=116, mtime=1685962553.4789724, mode=41471, uid=1001, gid=1001, link_target=PurePosixPath('.git/annex/objects/F1/gz/MD5E-s6--f2b33fb7b3d0eb95090a16060e6a24f9.dat/MD5E-s6--f2b33fb7b3d0eb95090a16060e6a24f9.dat'), fp=None, gitsha='455a9f4d07fe04aa39c2f4b87a15435a0e050a66', gittype=<GitTreeItemType.symlink: 'symlink'>)]
and can replace a low-level call to ls-files
(that I see is the base for iter_gitworktree
), while including more info (size, mtime) than the names alone.
pprint([PurePath(x) for x in ds.repo.call_git_items_(["ls-files", "*.dat"])])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I only have a few smaller remarks. [Edited]: The windows error seems genuine though
for content in runner.run(): | ||
# for each zerobyte-delimited "line" in the output | ||
for line in line_splitter.process(content.decode('utf-8')): | ||
yield line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chould run line_splitter.finish_processing()
, after the content-loop, to verify that no unterminated content was sent by git ls-files
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest a code change. I am not much familiar with this functionality.
It seems like the windows error is genuine |
Did you look into it? From the log I cannot tell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are failing on Windows because of comparison between PurePosixPath
- and PureWindowsPath
-instances.
I suggested a few patches to fix the tests. Alternative to the patches below we might think about changing the type of GitWorktreeItem.name
to PurePath
.
OK, thanks! So this really is #402 You patches seem to prefer the direction that leaves it to the particular iterator which path type to use. However, this would make I am leaning towards either making everything |
Generally, | [
|
I think it is worth a try and we might have to see where it leads us... |
I implemented that change. Resolving related conversations now. |
Test failure is all-familiar httpbin. Otherwise all splendid! Thanks for the reviews, this is going in. |
The iterator is also integrated with
ls-file-collection
as collection typegitworktree
.The integration in
ls-file-collection
does not expose all features. For some additional insight run something likeCloses #350
Ping #323
TODO
FileSystemItem
anditer_tar|dir()
to provide file IO #367