zreplicate performance enhancements #5

Rudd-O · 2013-08-31T06:47:21Z

On large datasets, the zfs get creation and zfs list that happens when zreplicate starts takes ages. These need to query specifically and only the dataset subtree they are gonna be manipulating, and at the same time if possible have these queries be parallelized.

JavaScriptDude · 2021-01-02T20:02:13Z

I think that maybe this can be implemented in the new zfslib library. I was thinking that the implementation would be as follows:

On initial load, only load zfs pools and first level only with zfs list -Hpr -d 1-t all.
Start up a new thread to sequentially load all L2+ for each dataset from step 1 with zfs list -Hpr -t all <pool>/<dataset>
On any queries that hit the unloaded data, hold the requesting process and prioritize that dataset query to satisfy request next.

This should be pretty fast on local system but for remote, I'm not sure about the communication overhead of each call versus load times.

Secondly, a new parameter can be added to zfslib.Connection.load_poolset() to specify the starting dataset which would prioritize the data loading to just that sub-tree.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zreplicate performance enhancements #5

zreplicate performance enhancements #5

Rudd-O commented Aug 31, 2013

JavaScriptDude commented Jan 2, 2021

zreplicate performance enhancements #5

zreplicate performance enhancements #5

Comments

Rudd-O commented Aug 31, 2013

JavaScriptDude commented Jan 2, 2021