You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On large datasets, the zfs get creation and zfs list that happens when zreplicate starts takes ages. These need to query specifically and only the dataset subtree they are gonna be manipulating, and at the same time if possible have these queries be parallelized.
The text was updated successfully, but these errors were encountered:
I think that maybe this can be implemented in the new zfslib library. I was thinking that the implementation would be as follows:
On initial load, only load zfs pools and first level only with zfs list -Hpr -d 1-t all.
Start up a new thread to sequentially load all L2+ for each dataset from step 1 with zfs list -Hpr -t all <pool>/<dataset>
On any queries that hit the unloaded data, hold the requesting process and prioritize that dataset query to satisfy request next.
This should be pretty fast on local system but for remote, I'm not sure about the communication overhead of each call versus load times.
Secondly, a new parameter can be added to zfslib.Connection.load_poolset() to specify the starting dataset which would prioritize the data loading to just that sub-tree.
On large datasets, the zfs get creation and zfs list that happens when zreplicate starts takes ages. These need to query specifically and only the dataset subtree they are gonna be manipulating, and at the same time if possible have these queries be parallelized.
The text was updated successfully, but these errors were encountered: