list: improve performance with many topics

On a test repo with 100 topic branches: $ git init test $ cd test $ git commit --allow-empty -m root $ git topics setup $ for i in $(seq 1 100); do git topics start $i; done Before: $ time git topics > /dev/null real 0m2.344s user 0m0.547s sys 0m0.457s After: $ time git topics > /dev/null real 0m0.285s user 0m0.084s sys 0m0.071s There are many subtle things to talk about here... This has long been a pain point for me on certain problematic repos, in particular because `git topics list --all --porcelain` is used by tab completion. While I can easily imagine more "specialized" ways of improving performance for tab completion, the first thing to attack was the `list` command itself. The core problem was that the previous algorithm would loop over every topic branch and invoke `git merge-base` (multiple times!) to determine where that branch was merged. Essentially, this amounted to O(n) calls in the number of topics, and each call is a relatively expensive operation. The fix introduced by this commit instead makes cleverer use of `git for-each-ref` so that the heavy lifting is done by its --merged & --no-merged flags using O(1) calls in the number of topics. While there are certainly fewer overall calls to git subcommands, I'm not 100% clear on why for-each-ref is more efficient than merge-base. Looking at the implementations as of this writing (circa git v2.20.0-rc1), merge-base uses commit-reach.c whereas for-each-ref uses ref-filter.c. The algorithms are hairier than I can be bothered to pick apart right now, but I suspect they're probably quite different. At any rate, the proof is in the pudding, as seen in the above benchmark. Changing this implementation meant reevaluating certain details of the original approach. In particular, the logic formerly contained in the `not_a_topic` function changes in two significant ways: 1. Now that we aren't calling merge-base on each individual branch, we won't be able to recognize the interesting edge case of orphan branches. Using `git checkout --orphan`, it's possible to create an entirely separate root commit such that a branch will have no ancestors in common with 'master' - so it's not *really* a topic branch. This is generally uncommon, though of course git.git exercises such strange cases, as in its 'todo' branch. All this change means is that branches like 'todo' will show up as unmerged topics. I figure that's not too terrible, given the (perceived?) infrequency of this situation. Plus, I'm in good company: even `git branch --no-merged` would have the same output, since it uses for-each-ref & ref-filter.c underneath. One possible workaround is to use the --contains flag to for-each-ref. If you could identify the root commit of the 'master' branch, you could make sure to only list refs that contained that commit. However, the ways I could think to find a suitable commit all seem hackish: * `git rev-list --max-parents=0` gives you *all* the root commits, so you'd still have to figure out which one belongs to 'master'. * `git rev-list --reverse --max-count=1` just gives you the HEAD commit, since the max count is applied *before* reversing the list. I guess you could just pipe it to `tail -1`, but that sort of makes me wrinkle my nose (strong argument, I know). * Might be able to just call --contains with the latest version tag from 'master', thus listing topics forked since the last release. However, this still breaks on the base case, when you haven't done a `git topics release` on the repo yet. Come to think of it, it even breaks if you just started a topic then released without finishing that topic yet. Never mind the fragility introduced by manual tagging and such. So I'll hold off on filtering orphans on the YAGNI assumption. 2. The old implementation had a special `case` clause to filter out refs/*/HEAD. I basically did this because I didn't realize where refs/origin/HEAD was coming from before in my own repos. GitHub sets this when you switch its "default branch", and several of my projects had it set to 'develop'. The real *underlying* thing I think we want to avoid is just any symref in general. We can accomplish this easily using one of the builtin atoms in for-each-ref --format. So really, this is an algorithmic improvement: instead of hard-coding HEAD, we avoid listing any symbolic refs (on the assumption that the concrete ref will be listed regardless). Finally, there has been an interesting performance *regression* when I tried this change out idly on a clone of the git.git repo. Before: $ time git topics list -s - pu real 0m1.059s user 0m0.748s sys 0m0.110s After: $ time git topics list -s - pu - todo real 0m2.822s user 0m2.430s sys 0m0.205s I have yet to be able to reproduce what exactly is causing this. There are few refs to loop through, so it seems to be an interesting case. My first thought was that the commit history is very long on 'master', so checking --merged was maybe slower in that case. However, I have not been able to duplicate this in a vacuum. The moral of the story, as it ever is with performance issues, is that I'll have to keep my eyes peeled for cases that are palpably slow. In the meantime, this commit seems to give a substantial improvement to my current real-world examples.
ajvondrak · Nov 29, 2018 · 5b65b8f · 5b65b8f
1 parent 414898f
commit 5b65b8f
Showing 1 changed file with 66 additions and 60 deletions.
diff --git a/libexec/lib/list b/libexec/lib/list
@@ -27,40 +27,40 @@ done
 
 require_setup
 
-refname() {
-  git rev-parse --symbolic-full-name "$1" 2>/dev/null
+push() {
+  git rev-parse --symbolic-full-name "$1@{push}" 2>/dev/null
 }
 
-master_ref="$(refname "$MASTER")"
-master_pushref="$(refname "$MASTER@{push}")"
-
-develop_ref="$(refname "$DEVELOP")"
-develop_pushref="$(refname "$DEVELOP@{push}")"
+topics="
+if test %(refname:short) != '$MASTER' &&
+   test %(refname:short) != '$DEVELOP' &&
+   test %(refname) != '$(push "$MASTER")' &&
+   test %(refname) != '$(push "$DEVELOP")' &&
+   test -z %(symref); then
+  echo %(refname:short)
+fi
+"
 
-not_a_topic() {
-  case "$1" in
-    refs/*/HEAD) return 0 ;;
-    "$master_ref"|"$master_pushref") return 0 ;;
-    "$develop_ref"|"$develop_pushref") return 0 ;;
-    *) test -z "$(git merge-base "$MASTER" "$1")" ;;
-  esac
+topics() {
+  eval "$(xargs git for-each-ref --shell --format="$topics")"
 }
 
-on_master=()
-on_develop=()
-on_topic=()
-
-while read branch; do
-  if not_a_topic "$branch"; then
-    continue
-  elif git merge-base --is-ancestor "$branch" "$MASTER"; then
-    on_master+=("$branch")
-  elif git merge-base --is-ancestor "$branch" "$DEVELOP"; then
-    on_develop+=("$branch")
-  else
-    on_topic+=("$branch")
-  fi
-done < <(echo "${patterns[@]}" | xargs git for-each-ref --format="%(refname)")
+finished=($(
+  git for-each-ref --format="%(refname)" --merged "$MASTER" $patterns |
+  topics
+))
+
+integrated=($(
+  git for-each-ref --format="%(refname)" --merged "$DEVELOP" $patterns |
+  xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
+  topics
+))
+
+started=($(
+  git for-each-ref --format="%(refname)" --no-merged "$DEVELOP" $patterns |
+  xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
+  topics
+))
 
 case "$format" in
   long|short)
@@ -79,71 +79,77 @@ case "$format" in
 esac
 
 if test "$colorize" = "true"; then
-  header="$(git config --get-color "color.topics.header" "normal")"
-  finished="$(git config --get-color "color.topics.finished" "green")"
-  integrated="$(git config --get-color "color.topics.integrated" "yellow")"
-  started="$(git config --get-color "color.topics.started" "red")"
-  reset="$(git config --get-color "" "reset")"
+  header_color="$(git config --get-color color.topics.header normal)"
+  finished_color="$(git config --get-color color.topics.finished green)"
+  integrated_color="$(git config --get-color color.topics.integrated yellow)"
+  started_color="$(git config --get-color color.topics.started red)"
+  reset_color="$(git config --get-color "" reset)"
 fi
 
-if test "${#on_master[@]}" -ne 0; then
+if test "${#finished[@]}" -ne 0; then
   case "$format" in
     long)
-      echo "${header}Topics merged to $MASTER:"
-      echo "  (use 'git topics release' to tag a new version)$reset"
+      echo "${header_color}Topics merged to $MASTER:"
+      echo "  (use 'git topics release' to tag a new version)$reset_color"
       echo
-      echo "${on_master[@]}" |
-      xargs git for-each-ref --format="    ${finished}%(refname:short)$reset"
+      for topic in "${finished[@]}"; do
+        echo "    $finished_color$topic$reset_color"
+      done
       echo
       ;;
     short|porcelain)
-      echo "${on_master[@]}" |
-      xargs git for-each-ref --format="* ${finished}%(refname:short)$reset"
+      for topic in  "${finished[@]}"; do
+        echo "* $finished_color$topic$reset_color"
+      done
       ;;
   esac
 fi
 
-if test "${#on_develop[@]}" -ne 0; then
+if test "${#integrated[@]}" -ne 0; then
   case "$format" in
     long)
-      echo "${header}Topics merged to $DEVELOP:"
-      echo "  (use 'git topics finish' to promote to $MASTER)$reset"
+      echo "${header_color}Topics merged to $DEVELOP:"
+      echo "  (use 'git topics finish' to promote to $MASTER)$reset_color"
       echo
-      echo "${on_develop[@]}" |
-      xargs git for-each-ref --format="    ${integrated}%(refname:short)$reset"
+      for topic in "${integrated[@]}"; do
+        echo "    $integrated_color$topic$reset_color"
+      done
       echo
       ;;
     short|porcelain)
-      echo "${on_develop[@]}" |
-      xargs git for-each-ref --format="+ ${integrated}%(refname:short)$reset"
+      for topic in "${integrated[@]}"; do
+        echo "+ $integrated_color$topic$reset_color"
+      done
       ;;
   esac
 fi
 
-if test "${#on_topic[@]}" -ne 0; then
+if test "${#started[@]}" -ne 0; then
   case "$format" in
     long)
-      echo "${header}Topics not yet merged:"
-      echo "  (use 'git topics integrate' to promote to $DEVELOP)$reset"
+      echo "${header_color}Topics not yet merged:"
+      echo "  (use 'git topics integrate' to promote to $DEVELOP)$reset_color"
       echo
-      echo "${on_topic[@]}" |
-      xargs git for-each-ref --format="    ${started}%(refname:short)$reset"
+      for topic in "${started[@]}"; do
+        echo "    $started_color$topic$reset_color"
+      done
       echo
       ;;
     short|porcelain)
-      echo "${on_topic[@]}" |
-      xargs git for-each-ref --format="- ${started}%(refname:short)$reset"
+      for topic in "${started[@]}"; do
+        echo "- $started_color$topic$reset_color"
+      done
       ;;
   esac
 fi
 
-if test "${#on_topic[@]}" -eq 0 &&
-   test "${#on_master[@]}" -eq 0 &&
-   test "${#on_develop[@]}" -eq 0; then
+if test "${#finished[@]}" -eq 0 &&
+   test "${#integrated[@]}" -eq 0 &&
+   test "${#started[@]}" -eq 0; then
   case "$format" in
     long)
-      echo "${header}No topics found."
-      echo "Use 'git topics start' to create a new branch.$reset"
+      echo "${header_color}No topics found."
+      echo "Use 'git topics start' to create a new branch.$reset_color"
       ;;
     short|porcelain)
       ;;