Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't deal well with large repos or lots of changes, even with -e #120

Open
zenspider opened this issue Jul 28, 2023 · 10 comments
Open

Doesn't deal well with large repos or lots of changes, even with -e #120

zenspider opened this issue Jul 28, 2023 · 10 comments
Labels
bug help wanted need info Blocked awaiting information

Comments

@zenspider
Copy link

I suspect my speed issues with git-explode ( aspiers/git-explode#6 ) come from git-deps. See that issue for stats on the size of the project. I can also provide things like commit depth on some of the files in question. I don't know if git-explode provides the HEAD argument to git-deps -e, but even when looking at the code with git-deps -e main ... it takes a VERY long time to run and seems to analyze a ton of hunks that are far beyond the -e commit.

@aspiers
Copy link
Owner

aspiers commented Jul 31, 2023

That sounds like a bug - it shouldn't touch ancestors of the -e commit. Did you try investigating with --debug?

@aspiers aspiers added bug help wanted need info Blocked awaiting information labels Jul 31, 2023
@aspiers
Copy link
Owner

aspiers commented Jul 31, 2023

It would be very helpful if you could provide a minimal test case.

@zenspider
Copy link
Author

I watched the output from --debug and it was definitely hitting old hunks but I don't know the output enough to know what's good vs bad output.

For a minimal repro... is there something I can use to generate a large repo? Maybe something that takes a random seed so we wind up with the exact same thing?

@zenspider
Copy link
Author

Eww... all the "random commit generators" I've found are for scamming your github history in order to look productive and get interviews.

@zenspider
Copy link
Author

zenspider commented Aug 1, 2023

OK. I think this might be a deterministic test case... It's a start at least.

but it is certainly not minimal. Not sure how one would minimize and still make the performance problems obvious... but try this:

10008 % git clone --branch=v2.6.33 --depth=100000 https://github.com/torvalds/linux.git
...
10009 % git log --all -M -C --name-only --format='format:' "$@" | sort | grep -v '^$' | uniq -c | sort -n | tail -3
 620 include/linux/sched.h
1064 kernel/sched.c
1128 MAINTAINERS
10010 % git log  --oneline -n 5 -- kernel/sched.c
fabf318e5e4 sched: Fix fork vs hotplug vs cpuset namespaces
6d558c3ac9b sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
0c69774e6ce sched: Revert 738d2be, simplify set_task_cpu()
70f11205277 sched: Fix hotplug hang
3df0fc5b2e9 sched: Restore printk sanity
10011 % time git deps -e 3df0fc5b2e9 fabf318e5e4 -d
...

This turns out to only be 990M (560M .git) and 180k commits. For reference the repo I'm in is 5.6G (3.1G .git) and 185k commits.

...still running after 10m

@aspiers
Copy link
Owner

aspiers commented Aug 6, 2023

Thanks. I wouldn't leave it running for more than a few seconds, the trick would be to enable --debug, pipe through a pager (I highly recommend lnav but less would do just fine), and then spot the first moment the debug starts looking at commits which you don't expect it to (i.e. ancestors of the -e value). Then the debug leading up to that moment will probably explain what's going wrong.

@zenspider
Copy link
Author

Reproducible error is listed above and is nicely constrained. Setup takes <1m even on a spinning disk. (--depth=5000 also works for the above subset)

But debugging your code and its internal modeling is going to have to be up to you.

@aspiers
Copy link
Owner

aspiers commented Oct 31, 2023

Can you do a rough binary search (bisection) on the --depth parameter to figure out at what depth the problem occurs? Starting at the low end for efficiency, e.g. 10, 100, 500, ...

@zenspider
Copy link
Author

I gave you everything you need.

@aspiers
Copy link
Owner

aspiers commented Nov 2, 2023

Not quite, I also need an extra hour in every day but I guess it would be unfair to expect you to give me that ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted need info Blocked awaiting information
Projects
None yet
Development

No branches or pull requests

2 participants