Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] zreplicate setting to specify the number of remote snapshots #16

Open
sophana opened this issue Apr 21, 2016 · 4 comments
Open

Comments

@sophana
Copy link

sophana commented Apr 21, 2016

Hi

Today zreplicate synchronize all snapshots from source to destination.
There are cases where the space constraints are different between the source and destination.
Sometime it could be the source that is space constrained, sometime the destination.

It would be nice if we could specify the number of source and remote snapshots separately, why not in zbackup settings.

Looking at zbackup's documentation, I haven't understood if such feature exists with the snapshot-limit setting.

Thanks

@Rudd-O
Copy link
Owner

Rudd-O commented Apr 23, 2016

I am happy to merge patches that would add such a feature in a sensible
way. I don't know how to do it myself, though.

@jefft
Copy link

jefft commented Jul 13, 2016

Put another way: say I want to snapshot ZFS every hour (hourly-snapshots=24), but only 'zbackup' the final state of the filesystem at the end of the day ('zbackup daily'). Even though I asked for the 'daily' tier to be backed up, zbackup will transfer the 23 hourly snapshots between yesterday and today. In effect it's doing zfs send -I @yesterday tank@today rather than zfs send -i @yesterday tank@today (unlike -i, -I gives every intermediate snapshot). Very wasteful of both space and bandwidth.

Just guessing: perhaps the original zfs-tools author didn't know about send -I, and chose to emulate it by iterating over snapshots. If that iterating-over-snapshots code were ripped out and replaced with send -I, then changing it to send -i would be trivial.

@Rudd-O
Copy link
Owner

Rudd-O commented Aug 12, 2016

The behavior of zreplicate demanded the use of -I because the assumption at the time is that you want an actual replica, not a pruned or incomplete replica. Bear in mind that this was originally merely a frontend for zfs send -RI.

I'm super happy to include features that allow for filtering certain datasets out of the replication process based on admin-specified policy (whether those are supplied by command-line parameters or as ZFS properties on the source dataset tree). This would allow us to eliminate a lot of code from zbackup by absorbing some of its policy configuration options into zreplicate. But, we would first need to agree on how that policy would look like, and how we would go about implementing that policy. This requires we both understand the possible use cases and how we'd express those use cases in configuration. And we must get this right, because quite a few people use these programs.

It should not be very hard to do such a thing — perhaps all the policy needs to do is "hide" (filter) some source datasets prior to the diffing algorithm used to compute the replication plan, perhaps the change needs to be made on the diffing algorithm proper (I can't really tell you off the top of my head). The only difficult thing I foresee is modifying the plan optimizer that substantially increases the performance of the replication process, so that it won't optimize away the instructions to filter certain snapshots from being transferred.

@Rudd-O
Copy link
Owner

Rudd-O commented Aug 12, 2016

Here is the code that computes the replication plan:

https://github.com/Rudd-O/zfs-tools/blob/master/src/zfstools/sync.py#L12

Note that the optimizer later on makes the necessary assumption (at least necessary given the current codebase) that consecutive snapshots 1.2.3.4 of the same dataset will be optimized to a -I transfer of 1->4.

This assumption, believe it or not, provides an enormous reduction in commands that need to be executed, sometimes from hundreds of sends to a single send command (in the most common case of sending a set of recursive zfs diffs of an entire tree of datasets all of which are snapshotted recursively in the sending side).

So we would have to figure out a way to maintain the assumption, but if policy demands it, break the assumption and do -i when appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants