Skip to content

Commit

Permalink
list: improve performance with many topics
Browse files Browse the repository at this point in the history
On a test repo with 100 topic branches:

  $ git init test
  $ cd test
  $ git commit --allow-empty -m root
  $ git topics setup
  $ for i in $(seq 1 100); do git topics start $i; done

Before:

  $ time git topics > /dev/null
  real	0m2.344s
  user	0m0.547s
  sys	0m0.457s

After:

  $ time git topics > /dev/null
  real	0m0.285s
  user	0m0.084s
  sys	0m0.071s

There are many subtle things to talk about here...

This has long been a pain point for me on certain problematic repos, in
particular because `git topics list --all --porcelain` is used by tab
completion. While I can easily imagine more "specialized" ways of
improving performance for tab completion, the first thing to attack was
the `list` command itself.

The core problem was that the previous algorithm would loop over every
topic branch and invoke `git merge-base` (multiple times!) to determine
where that branch was merged. Essentially, this amounted to O(n) calls
in the number of topics, and each call is a relatively expensive
operation.

The fix introduced by this commit instead makes cleverer use of `git
for-each-ref` so that the heavy lifting is done by its --merged &
--no-merged flags using O(1) calls in the number of topics. While there
are certainly fewer overall calls to git subcommands, I'm not 100% clear
on why for-each-ref is more efficient than merge-base. Looking at the
implementations as of this writing (circa git v2.20.0-rc1), merge-base
uses commit-reach.c whereas for-each-ref uses ref-filter.c. The
algorithms are hairier than I can be bothered to pick apart right now,
but I suspect they're probably quite different. At any rate, the proof
is in the pudding, as seen in the above benchmark.

Changing this implementation meant reevaluating certain details of the
original approach. In particular, the logic formerly contained in the
`not_a_topic` function changes in two significant ways:

1. Now that we aren't calling merge-base on each individual branch, we
   won't be able to recognize the interesting edge case of orphan
   branches. Using `git checkout --orphan`, it's possible to create an
   entirely separate root commit such that a branch will have no
   ancestors in common with 'master' - so it's not *really* a topic
   branch. This is generally uncommon, though of course git.git
   exercises such strange cases, as in its 'todo' branch.

   All this change means is that branches like 'todo' will show up as
   unmerged topics. I figure that's not too terrible, given the
   (perceived?) infrequency of this situation. Plus, I'm in good
   company: even `git branch --no-merged` would have the same output,
   since it uses for-each-ref & ref-filter.c underneath.

   One possible workaround is to use the --contains flag to
   for-each-ref. If you could identify the root commit of the 'master'
   branch, you could make sure to only list refs that contained that
   commit. However, the ways I could think to find a suitable commit all
   seem hackish:

   * `git rev-list --max-parents=0` gives you *all* the root commits, so
     you'd still have to figure out which one belongs to 'master'.

   * `git rev-list --reverse --max-count=1` just gives you the HEAD
     commit, since the max count is applied *before* reversing the list.
     I guess you could just pipe it to `tail -1`, but that sort of makes
     me wrinkle my nose (strong argument, I know).

   * Might be able to just call --contains with the latest version tag
     from 'master', thus listing topics forked since the last release.
     However, this still breaks on the base case, when you haven't done
     a `git topics release` on the repo yet. Come to think of it, it
     even breaks if you just started a topic then released without
     finishing that topic yet. Never mind the fragility introduced by
     manual tagging and such.

   So I'll hold off on filtering orphans on the YAGNI assumption.

2. The old implementation had a special `case` clause to filter out
   refs/*/HEAD. I basically did this because I didn't realize where
   refs/origin/HEAD was coming from before in my own repos. GitHub sets
   this when you switch its "default branch", and several of my projects
   had it set to 'develop'. The real *underlying* thing I think we want
   to avoid is just any symref in general. We can accomplish this easily
   using one of the builtin atoms in for-each-ref --format. So really,
   this is an algorithmic improvement: instead of hard-coding HEAD, we
   avoid listing any symbolic refs (on the assumption that the concrete
   ref will be listed regardless).

Finally, there has been an interesting performance *regression* when I
tried this change out idly on a clone of the git.git repo.

Before:

  $ time git topics list -s
  - pu

  real	0m1.059s
  user	0m0.748s
  sys	0m0.110s

After:

  $ time git topics list -s
  - pu
  - todo

  real	0m2.822s
  user	0m2.430s
  sys	0m0.205s

I have yet to be able to reproduce what exactly is causing this. There
are few refs to loop through, so it seems to be an interesting case. My
first thought was that the commit history is very long on 'master', so
checking --merged was maybe slower in that case. However, I have not
been able to duplicate this in a vacuum.

The moral of the story, as it ever is with performance issues, is that
I'll have to keep my eyes peeled for cases that are palpably slow. In
the meantime, this commit seems to give a substantial improvement to my
current real-world examples.
  • Loading branch information
ajvondrak committed Nov 29, 2018
1 parent 414898f commit 5b65b8f
Showing 1 changed file with 66 additions and 60 deletions.
126 changes: 66 additions & 60 deletions libexec/lib/list
Original file line number Diff line number Diff line change
Expand Up @@ -27,40 +27,40 @@ done

require_setup

refname() {
git rev-parse --symbolic-full-name "$1" 2>/dev/null
push() {
git rev-parse --symbolic-full-name "$1@{push}" 2>/dev/null
}

master_ref="$(refname "$MASTER")"
master_pushref="$(refname "$MASTER@{push}")"

develop_ref="$(refname "$DEVELOP")"
develop_pushref="$(refname "$DEVELOP@{push}")"
topics="
if test %(refname:short) != '$MASTER' &&
test %(refname:short) != '$DEVELOP' &&
test %(refname) != '$(push "$MASTER")' &&
test %(refname) != '$(push "$DEVELOP")' &&
test -z %(symref); then
echo %(refname:short)
fi
"

not_a_topic() {
case "$1" in
refs/*/HEAD) return 0 ;;
"$master_ref"|"$master_pushref") return 0 ;;
"$develop_ref"|"$develop_pushref") return 0 ;;
*) test -z "$(git merge-base "$MASTER" "$1")" ;;
esac
topics() {
eval "$(xargs git for-each-ref --shell --format="$topics")"
}

on_master=()
on_develop=()
on_topic=()

while read branch; do
if not_a_topic "$branch"; then
continue
elif git merge-base --is-ancestor "$branch" "$MASTER"; then
on_master+=("$branch")
elif git merge-base --is-ancestor "$branch" "$DEVELOP"; then
on_develop+=("$branch")
else
on_topic+=("$branch")
fi
done < <(echo "${patterns[@]}" | xargs git for-each-ref --format="%(refname)")
finished=($(
git for-each-ref --format="%(refname)" --merged "$MASTER" $patterns |
topics
))

integrated=($(
git for-each-ref --format="%(refname)" --merged "$DEVELOP" $patterns |
xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
topics
))

started=($(
git for-each-ref --format="%(refname)" --no-merged "$DEVELOP" $patterns |
xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
topics
))

case "$format" in
long|short)
Expand All @@ -79,71 +79,77 @@ case "$format" in
esac

if test "$colorize" = "true"; then
header="$(git config --get-color "color.topics.header" "normal")"
finished="$(git config --get-color "color.topics.finished" "green")"
integrated="$(git config --get-color "color.topics.integrated" "yellow")"
started="$(git config --get-color "color.topics.started" "red")"
reset="$(git config --get-color "" "reset")"
header_color="$(git config --get-color color.topics.header normal)"
finished_color="$(git config --get-color color.topics.finished green)"
integrated_color="$(git config --get-color color.topics.integrated yellow)"
started_color="$(git config --get-color color.topics.started red)"
reset_color="$(git config --get-color "" reset)"
fi

if test "${#on_master[@]}" -ne 0; then
if test "${#finished[@]}" -ne 0; then
case "$format" in
long)
echo "${header}Topics merged to $MASTER:"
echo " (use 'git topics release' to tag a new version)$reset"
echo "${header_color}Topics merged to $MASTER:"
echo " (use 'git topics release' to tag a new version)$reset_color"
echo
echo "${on_master[@]}" |
xargs git for-each-ref --format=" ${finished}%(refname:short)$reset"
for topic in "${finished[@]}"; do
echo " $finished_color$topic$reset_color"
done
echo
;;
short|porcelain)
echo "${on_master[@]}" |
xargs git for-each-ref --format="* ${finished}%(refname:short)$reset"
for topic in "${finished[@]}"; do
echo "* $finished_color$topic$reset_color"
done
;;
esac
fi

if test "${#on_develop[@]}" -ne 0; then
if test "${#integrated[@]}" -ne 0; then
case "$format" in
long)
echo "${header}Topics merged to $DEVELOP:"
echo " (use 'git topics finish' to promote to $MASTER)$reset"
echo "${header_color}Topics merged to $DEVELOP:"
echo " (use 'git topics finish' to promote to $MASTER)$reset_color"
echo
echo "${on_develop[@]}" |
xargs git for-each-ref --format=" ${integrated}%(refname:short)$reset"
for topic in "${integrated[@]}"; do
echo " $integrated_color$topic$reset_color"
done
echo
;;
short|porcelain)
echo "${on_develop[@]}" |
xargs git for-each-ref --format="+ ${integrated}%(refname:short)$reset"
for topic in "${integrated[@]}"; do
echo "+ $integrated_color$topic$reset_color"
done
;;
esac
fi

if test "${#on_topic[@]}" -ne 0; then
if test "${#started[@]}" -ne 0; then
case "$format" in
long)
echo "${header}Topics not yet merged:"
echo " (use 'git topics integrate' to promote to $DEVELOP)$reset"
echo "${header_color}Topics not yet merged:"
echo " (use 'git topics integrate' to promote to $DEVELOP)$reset_color"
echo
echo "${on_topic[@]}" |
xargs git for-each-ref --format=" ${started}%(refname:short)$reset"
for topic in "${started[@]}"; do
echo " $started_color$topic$reset_color"
done
echo
;;
short|porcelain)
echo "${on_topic[@]}" |
xargs git for-each-ref --format="- ${started}%(refname:short)$reset"
for topic in "${started[@]}"; do
echo "- $started_color$topic$reset_color"
done
;;
esac
fi

if test "${#on_topic[@]}" -eq 0 &&
test "${#on_master[@]}" -eq 0 &&
test "${#on_develop[@]}" -eq 0; then
if test "${#finished[@]}" -eq 0 &&
test "${#integrated[@]}" -eq 0 &&
test "${#started[@]}" -eq 0; then
case "$format" in
long)
echo "${header}No topics found."
echo "Use 'git topics start' to create a new branch.$reset"
echo "${header_color}No topics found."
echo "Use 'git topics start' to create a new branch.$reset_color"
;;
short|porcelain)
;;
Expand Down

0 comments on commit 5b65b8f

Please sign in to comment.