Skip to content

Latest commit

 

History

History
380 lines (239 loc) · 16.8 KB

gitsync-nncp.org

File metadata and controls

380 lines (239 loc) · 16.8 KB

gitsync-nncp Manual

Introduction

Why it’s here

Sometimes you have some directories of files you’d like to keep synced between multiple. But not just that: you want a nice history kept, intelligent conflict resolution, and so forth. git is an obvious solution for this. Perhaps also some of your data is sensitive.

But then, how do you handle the transport, especially if, say, both machines are behind a firewall, one is a laptop that travels, etc?

  • Maybe you try setting up a git server on a VPS somewhere. But:
    • That costs money.
    • And, your sensitive data may be exposed. (You could use things like git-remote-gcrypt or git-crypt, but they are finicky, a hassle, and hard to get right.)
  • Maybe you could try sharing your repo with something like Syncthing or Dropbox.
    • This is a recipe for all sorts of git errors due to race conditions and conflicts when two machines try to modify the file at once.
  • OK, so maybe you could work up some sort of complicated scheme where there is a second repo on Syncthing, and only one machine modifies it, and the others send their changes to the master machine, and…
    • Yeah, I did that. It’s even more complicated and finicky than it sounds.
    • Also that has issues if the master machine is down.

What we really want is something that:

  • Does the basic syncing and history
  • Is resilient in the face of conflicts and races
  • Doesn’t require any particular machine to be in charge
  • Doesn’t require two machines to be online simultaneously (asynchronous data transfer)

NNCP (see my blog series about it) is like ssh, but for asynchronous communication. Thanks to its new multicast areas support, it can actually handle this situation quite efficiently.

gitsync-nncp does all of that.

Feature List

Besides the above, gitsync-nncp:

  • Is designed to be easily scriptable and cronnable
  • Is written in easy-to-understand shell code
  • Is designed to be as safe as possible
  • Doesn’t actually require NNCP; the transmission parts are entirely separate and you can substitute your own transport. It’s all just pipes. It works particularly well with my tool Filespooler as well.
  • Logs to syslog

gitsync-nncp is NOT:

  • Suitable for use with untrusted peers. It’s designed to be used with only your own machines. Making it suitable for untrusted peers would probably require rewriting in something other than shell and doing a lot more input validation.

Operation Overview

With gitsync-nncp, the idea is there is no master/main branch. Every machine works on a branch with a unique name (often the hostname of the machine). Each repo also contains branches named gitsync-nncp/*, one for each machine in the cluster (including the local one). These are roughly analogous to git’s remote-tracking branches.

After you make a local change, gitsync-nncp compares your local branch (say, alice) to the previous saved state of it (gitsync-nncp/alice). Then, it generates a git bundle that represents all the commits between the prior state and the current one. These are sent to NNCP (or some other program of your choice) for distribution to all the other nodes.

When your system receives an NNCP packet for the repo from a remote - which can happen at any time - all it does is update the gitsync-nncp branch for the remote.

At some point, you want to update your working tree with changes made on the remotes. This is the one command that is potentially interactive, because in the event of a conflict, you may need to resolve it manually. This process simply merges the changes from all the remote branches into your local branch. Your local branch’s new state will be propagated to all the others on the next push.

Setup

Installing NNCP

Of course, you’ll need to install NNCP before you can use this. The NNCP homepage or https mirror document this for you.

Adding an area (optional)

This approach works best when there is an NNCP “area” available to use. NNCP’s areas documentation introduce you to them. You’ll need to add something to your configuration file. Please refer to NNCP’s areas documentation for details.

You can begin by running nncp-cfgnew -area localgit. This will emit a configuration snippet such as this:

areas: {
  localgit: {
    id: ...

    # KEEP AWAY keypair from the nodes you want only participate in multicast
    pub: ...
    prv: ...

    # List of subscribers you should multicast area messages to
    # subs: ["alice"]

    # Allow incoming files (from the area) saving in that directory
    # incoming: /home/areas/localgit/incoming

    # Allow incoming area commands execution
    # exec: {sendmail: ["/usr/sbin/sendmail"]}

    # Allow unknown sender's message tossing (relaying will be made anyway)
    # allow-unknown: true
  }
}

Add this to your nncp.hjson configuration file before the final }, and let’s start editing.

First, you’ll need to list all the machines that should receive the updates in the subs line. (Footnote: some complicated topologies may not require you to, but those are beyond the scope of this document.) Uncomment that line and add them.

Next, we’ll need to define an exec line, one for each git repo you want this group of machines to be able to participate in. For instance:

exec: {
  repo1: ["/path/tp/gitsync-nncp", "receive", "/home/git/repo1"]
  repo2: ["/path/to/gitsync-nncp", "receive", "/home/git/repo2"]
}

Save this, then copy the entire area section to the nncp.hjson on each machine. OK, you’re ready to go!

Making a new git repo

You can use git init like usual, but then rename the main/master branch to something unique for your machine; for instance, its hostname; for instance:

git branch -m alice

Now, we also need to create another branch that gitsync-nncp uses to track progress. These branches begin with gitsync-nncp/.

git branch gitsync-nncp/alice

OK, that’s it for the setup process for the first repo. Now, on to setting up subsequent repos.

Setting up the second and subsequent repos

In this example, we’ll assume “alice” is an existing machine with a repo, and “bob” is the new one.

First, you want to make sure your existing repos are all synced (HEAD is the same).

Now, on EVERY existing machine (such as alice in this example), you need to make a branch for tracking the new machine:

git branch gitsync-nncp/bob

Next, on the new machine, you’ll first want to clone the original. For instance, let’s say we’re on machine bob and are copying from alice:

git clone alice:repo

Now, we need to rename the branch to the local machine as before:

git branch -m bob

Next, we need to create the gitsync-nncp branches for both the local and ALL other remotes. For instance:

git branch gitsync-nncp/bob
git branch gitsync-nncp/alice
git branch gitsync-nncp/claire

Using an existing git repo

This setup doesn’t use a master/main branch; every host has a branch named after it. You can, of course, use one outside of gitsync-nncp, but then you have the question of “which machine should update the main branch, and once updated, what is its purpose anyhow?”

After making sure all clones of the repo are at the same revision, you would want to simply rename the master/main branch on every machine to its local hostname; for instance:

git branch -m alice

Then you’ll want to create the gitsync-nncp branches for each participating machine (including the one you’re working on); for instance:

git branch gitsync-nncp/bob
git branch gitsync-nncp/alice
git branch gitsync-nncp/claire

Usage

Now, let’s start working! You start hacking on some things, and then you can run:

gitsync-nncp sync nncp-exec area:localgit repo1

This will do several things:

  1. Automatically generate a git commit, if needed, reflecting local changes
  2. Merge in any changes that were made on remotes
  3. Push the new local branch state to the other machines.

If the local branch state was different than before the command ran, then the git bundle will be piped to the command given as a parameter to gitsync-nncp sync, adding on the name of the local branch at the end. So here, it runs nncp-exec area:localgit repo1 alice - but only if there are changes to send.

On each remote, because of your configuration, it will automatically run gitsync-nncp receive /home/git/repo1 alice. This receives the updates and applies them to the gitsync-nncp/alice branch on each remote – but it doesn’t touch their working directories.

The next time you run gitsync-nncp sync on a remote, you’ll merge in any changes that are pending and go from there. Easy!

You can, of course, set up a shell alias or script so you don’t have to type that whole command all the time.

Next, I’ll go into a command summary and then cover some more advanced topics.

Commands

All commands will:

  • Verify that they are operating on a git repository
  • Verify that there is no merge in progress
  • Verify that the current branch isn’t named master or main, as an attempt at something of a sanity check that you’re on a branch with a unique name (it is, of course, not a guarantee of this)

If either of these checks fail, the commands will abort immediately.

gitsync-nncp autocommit

If there are any uncommitted changes, commit them with git commit -am. The generated commit message will name the local hostname and branch for future reference.

This command will not call git add or git rm, though if you have used those commands but not committed the changes yet, those changes will be committed along with the other changes. Also, since -a is used, and deleted local files will be reflected in the commit.

If you use autocommit, it is often useful to wrap it in a script that makes appropriate calls to git add.

gitsync-nncp merge

This command merges all remote branches into the local branch by calling git merge on each. It is possible that you may be prompted for a git commit message if changes are non-fast-forward. It is also possible for this command to fail if there were merge conflicts. If so, resolve them in the usual way (edit files, git add, git commit) and then rerun this command.

gitsync-nncp push command [args…]

This command checks the state of the local branch and compares it to the state at last push. If there are differences, it prepares a git bundle for transmission, and hands it off to the given command. After the given arguments, the name of the local branch is added.

The given command should often be something like nncp-exec area:localgit reponame.

gitsync-nncp push_full command [args…]

Performs an autocommit, then push. The meaning of command/args are as defined in push.

gitsync-nncp receive REPODIR BRANCHNAME

Expects a git bundle on stdin. It applies it to the branch gitsync-nncp/BRANCHNAME on the git repo located at REPODIR. It does not modify the working directory or checked out branch of the repo. This is designed to be the target of nncp-exec.

gitsync-nncp sync command [args…]

Performs these commands in order:

  • autocommit
  • merge
  • push

The passed command/args are passed to push as above. Please see the relevant documentation fot ehse other commands.

This command has the potential to be interactive or fail due to the merge.

The idea of it is: commit our local changes, merge in remote ones, and then send the new local state to all the remotes.

Use in cron

It is safe to call gitsync-nncp push_full from cron. This is one way to make sure that you periodically update the remote with your local changes. (Remember to do git adds also!)

Use in git hooks

It is also safe to call gitsync-nncp push from a git post-commit hook. This will cause every local commit to be immediately propagated to the remote. It is probably unwise to do push_full from a post-commit hook, since that could lead to a potentially confusing situation where one git commit could introduce two commits.

Example: non-NNCP commands: gpg encryption of packets

Perhaps you don’t want to trust NNCP or the user nncp runs as - maybe it’s a systemwide installation someone else did, etc. You can always use gpg to encrypt (and perhaps sign) your packets first. To do this, you will first need to generate a GPG keypair with no passphrase, make it available on every machine, and set gpg to trust it ultimately everywhere. Then you may have a couple of scripts. The first is gitsync-send:

#!/usr/bin/env bash

set -euo pipefail

gpg -e -r 0x[fingerprint] -u 0x[fingerprint] --sign - | \
    nncp-exec area:localgit repo1 "$@"

You’ll use this with commands like push; for instance: gitsync-nncp push ~/bin/gitsync-send.

Now, how about the receiving side? Here’s gitsync-recv:

#!/usr/bin/env bash

set -euo pipefail

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}

TMPDIR="$(mktemp -d)"
trap 'ECODE=$?; rm -r "$TMPDIR"; exit $ECODE' EXIT INT TERM
chmod 0700 "$TMPDIR"

STATUSFILE="$TMPDIR/status"
DATAFILE="$TMPDIR/data"

# decrypt stdin

runcommand gpg --status-file "$STATUSFILE" -d > "$DATAFILE"

# This will fail if we don't find what we are looking for there

runcommand grep -q 'VALIDSIG [fingerprint]' "$STATUSFILE"
runcommand /home/jgoerzen/bin/gitsync-nncp receive "$@" < "$DATAFILE"

This decrypts and verifies the data, then passes it on to gitsync-nncp. Now, your nncp-hjson will point to this script instead of gitsync-nncp receive.

Of course, you’ll need to use your local fingerprint in all the relevant spots above.

Use when NNCP runs as a different user

My using NNCP with sudo documentation contains an example of using gitsync-nncp under sudo.

Example: syncing org-mode and org-roam files

I am a huge fan of org-mode and org-roam. gitsync-nncp was written initially to meet a need I had with these.

In my ~/org, I have a Makefile that looks like this. (It could as easily have been a shell script, too):

SHELL := /bin/bash
SENDCMD := /home/jgoerzen/bin/gitsync-send
GS := /home/jgoerzen/bin/gitsync-nncp

ADD := git add data roam/data roam/*/data *.org */*.org roam/*/*.org roam/*/*/*.org Makefile .gitignore *.org_archive

.PHONY: all
all: sync

.PHONY: push
push:
	-$(ADD)
	$(GS) push_full $(SENDCMD)

.PHONY: sync
sync:
	-$(ADD)
	$(GS) sync $(SENDCMD)

All this does is make a little convenient wrapper around the sync and push commands. In my user’s crontab, I call:

make -C $HOME/org push 2>&1 | logger --tag 'orgsync'

I can also make sync whenever I want.

Out-of-order packet delivery

NNCP doesn’t guarantee packets are delivered in order. That’s fine, because when packets are delivered out of order, gitsync-nncp receive will fail with an error code. That will cause nncp-toss to retry the delivery at the next toss. This may require a few toss cycles to get everything delivered, but it will eventually.

Warnings

It’s important that:

  • Every machine have a uniquely-named branch that is checked out locally
  • The gitsync-nncp commands are never run when a different branch is the current one on a machine

Integrations

There is a git-annex plugin for gitsync-nncp here: https://git.sr.ht/~ehmry/git-annex-remote-nncp

COPYRIGHT

gitsync-nncp Copyright (C) 2021-2022 John Goerzen <jgoerzen@complete.org>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.