Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zreplicate performance enhancements #5

Open
Rudd-O opened this issue Aug 31, 2013 · 1 comment
Open

zreplicate performance enhancements #5

Rudd-O opened this issue Aug 31, 2013 · 1 comment

Comments

@Rudd-O
Copy link
Owner

Rudd-O commented Aug 31, 2013

On large datasets, the zfs get creation and zfs list that happens when zreplicate starts takes ages. These need to query specifically and only the dataset subtree they are gonna be manipulating, and at the same time if possible have these queries be parallelized.

@JavaScriptDude
Copy link
Contributor

I think that maybe this can be implemented in the new zfslib library. I was thinking that the implementation would be as follows:

  1. On initial load, only load zfs pools and first level only with zfs list -Hpr -d 1-t all.
  2. Start up a new thread to sequentially load all L2+ for each dataset from step 1 with zfs list -Hpr -t all <pool>/<dataset>
  3. On any queries that hit the unloaded data, hold the requesting process and prioritize that dataset query to satisfy request next.

This should be pretty fast on local system but for remote, I'm not sure about the communication overhead of each call versus load times.

Secondly, a new parameter can be added to zfslib.Connection.load_poolset() to specify the starting dataset which would prioritize the data loading to just that sub-tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants