Skip to content
gilbertchen edited this page Jul 21, 2020 · 6 revisions

The prune command has the task of deleting old/unwanted revisions and unused chunks from a storage.

Click here for a list of related forum topics.

Quick overview

NAME:
   duplicacy prune - Prune revisions by number, tag, or retention policy

USAGE:
   duplicacy prune [command options]

OPTIONS:
   -id <snapshot id>            delete revisions with the specified snapshot ID instead of the default one
   -all, -a                     match against all snapshot IDs
   -r <revision> [+]            delete the specified revisions
   -t <tag> [+]                 delete revisions with the specified tags
   -keep <n:m> [+]              keep 1 revision every n days for revisions older than m days
   -exhaustive                  remove all unreferenced chunks (not just those referenced by deleted snapshots)
   -exclusive                   assume exclusive access to the storage (disable two-step fossil collection)
   -dry-run, -d                 show what would have been deleted
   -delete-only                 delete fossils previously collected (if deletable) and don't collect fossils
   -collect-only                identify and collect fossils, but don't delete fossils previously collected
   -ignore <id> [+]             ignore revisions with the specified snapshot ID when deciding if fossils can be deleted
   -storage <storage name>      prune revisions from the specified storage
   -threads <n>                 number of threads used to prune unreferenced chunks

Usage

duplicacy prune [command options]

Options

Options marked with [+] can be passed more than once.

-id <snapshot id>

Delete revisions with the specified snapshot ID instead of the default one.

Example:
duplicacy prune -id computer-2

-all, -a

Run the prune command against all snapshot IDs in selected storage.

Example:
duplicacy prune -all

-r <revision> [+]

Delete the specified revisions.

Examples:
duplicacy prune -r 6              # delete revision 6
duplicacy prune -r 344-350        # delete revisions starting with 344 to 350 (included)
duplicacy prune -r 310 -r 1322    # delete only the revisions 310 and 1322

-t <tag> [+]

Delete revisions with the specified tags.

-keep <n:m> [+]

Keep 1 revision every n days for revisions older than m days.

The retention policies are specified by the -keep option, which accepts an argument in the form of two numbers n:m, where n indicates the number of days between two consecutive revisions to keep, and m means that the policy only applies to revisions at least m day old. If n is zero, any revisions older than m days will be removed.

Examples:
duplicacy prune -keep 1:7       # Keep a revision per (1) day for revisions older than 7 days
duplicacy prune -keep 7:30      # Keep a revision every 7 days for revisions older than 30 days
duplicacy prune -keep 30:180    # Keep a revision every 30 days for revisions older than 180 days
duplicacy prune -keep 0:360     # Keep no revisions older than 360 days

Multiple -keep options must be sorted by their m values in decreasing order.

For example, to combine the above policies into one line, it would become:

duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7

-exhaustive

Remove all unreferenced chunks (not just those referenced by deleted revisions).

The -exhaustive option will scan the list of all chunks in the storage, therefore it will find not only unreferenced chunks from deleted revivions, but also chunks that become unreferenced for other reasons, such as those from an incomplete backup.

It will also find any file that does not look like a chunk file.

In contrast, a normal prune command will only identify chunks referenced by deleted revisions but not any other revisions.

Example:
duplicacy prune -exhaustive

-exclusive

Assume exclusive access to the storage (disable two-step fossil collection).

The -exclusive option will assume that no other clients are accessing the storage, effectively disabling the two-step fossil collection algorithm.

With this option, the prune command will immediately remove unreferenced chunks.

WARNING: Only run -exclusive when you are sure that no other backup is running, on any other device or repository.

Example:
duplicacy prune -exclusive

-dry-run, -d

This option is used to test what changes the prune command would have done. It is guaranteed not to make any changes on the storage, not even creating the local fossil collection file.

Example:

After running this nothing will be modified in the storage, but duplicacy will show all output just like a normal run:

duplicacy prune -dry-run -all -exhaustive - exclusive

-delete-only

Delete fossils previously collected (if deletable) and don't collect fossils.

Example:
duplicacy prune -delete-only

-collect-only

Identify and collect fossils, but don't delete fossils previously collected.

Example:
duplicacy prune -collect-only

The -delete-only option will skip the fossil collection step, while the -collect-only option will skip the fossil deletion step.

-ignore <id> [+]

Ignore revisions with the specified snapshot ID when deciding if fossils can be deleted.

-storage <storage name>

Prune revisions from the specified storage instead of the default one.

Example:
duplicacy prune -storage google-drive

-threads <n>

This option is used to specify more than one thread to prune chunks. This is generally useful to increase pruning speed.

💡 You should test the best number of threads for your connection and storage provider but using more than 30 threads is unadvised as it will not improve speeds significantly.

Example

duplicacy prune -keep 1:7 -threads 10 # use 10 threads for the pruning process

Notes

💡 Revivions to be deleted can be specified by numbers, by a tag, by retention policies, or by any combination of these categories.

💡 Only one repository should run prune

Since :d: encourages multiple repositories backing up to the same storage (so that deduplication will be efficient), users might want to run prune from each different repository.

The design of :d: however was based on the assumption that only one instance would run the prune command (using -all). This can greatly simplify the implementation.

It also is a bit wasting the resources to have a prune command working on one repository id only, since it still needs to download all backups for all other repository ids in order to decide which chunks are to be deleted.

Finally, in theory race conditions can happen when two instances try to operate on the same chunk at the same time, but in practice it may never happen especially if the prune command runs after the backup so they will start at random times.

💡 Pruning is logged

All prune actions are logged by default locally, on the machine where the prune command is executed, under .duplicacy/logs. The prune logs are named similarly to prune-log-20171230-142510.

In the same folder you will also find log files which are empty. There is no need to worry if the files are empty as this means that in that particular prune operation, nothing was pruned from the storage.

💡 -exhaustive should be used sparingly

The -exhaustive option is only needed when there are known unreferenced chunks in the storage, for example, when a backup is interrupted by user and terminated due to an error and the files in the repository change afterwards.

It is not recommended to run the prune command regularly with this option without a recent incomplete backup, mainly because if there is an ongoing backup from a different computer, the prune command will mark as fossils all new chunks uploaded by that backup.

Although in the fossil deletion step the prune command can correctly identify that these chunks are actually referenced and thus turn them back into chunks, the cost of extra API calls can be excessive.

💡 The last revision can only be deleted in -exclusive mode

The latest revision from each repository can’t be deleted in non-exclusive mode because in theory it is possible that a backup for that repository may be in progress which will use the latest revision as the base, so removal of the latest revision would cause some chunks to be removed even though they are needed by the backup in progress.

⚠️ Corner cases when prune may delete too much

There are two corner cases that a fossil still needed may be mistakenly deleted. When there is a backup taking more than 7 days that started before the chunk was marked as fossil, then the prune command will think the repository has become inactive which will then be excluded from the criteria for determining safe fossils to be deleted.

The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune command doesn't know the existence of such a repository at the fossil deletion time, it may think the fossil isn't needed any more by any backup and thus delete it permanently.

Therefore, a check command must be used if a backup is an initial backup or takes more than 7 days. Once a backup passes the check command, it is guaranteed that it won't be affected by any future prune operations.

💡 Individual files cannot be pruned

Note that duplicacy always prunes entire revisions of entire snapshots, not of individual files. In other words: it is not possible to remove backups of specific files from the storage. This means, for example, if you realize after a couple of months, that you have accidentally been backing up some huge useless files, the only way to remove them from the storage to free up space is to prune each and every revision in which they are included.

Two-step fossil collection algorithm

The prune command implements the two-step fossil collection algorithm. It will first find fossil collection files from previous runs and check if contained fossils are eligible for permanent deletion (the fossil deletion step). Then it will search for snapshots to be deleted, mark unreferenced chunks as fossils (by renaming) and save them in a new fossil collection file stored locally (the fossil collection step).

For fossils collected in the fossil collection step to be eligible for safe deletion in the fossil deletion step, at least one new snapshot from each snapshot id must be created between two runs of the prune command. However, some repository may not be set up to back up with a regular schedule, and thus literally blocking other repositories from deleting any fossils. Duplicacy by default will ignore repositories that have no new backup in the past 7 days, and you can also use the -ignore option to skip certain repositories when deciding the deletion criteria.

Clone this wiki locally