Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable hardcoded rsync options #157

Closed
FrankyGT opened this issue Mar 25, 2021 · 1 comment · Fixed by #167
Closed

disable hardcoded rsync options #157

FrankyGT opened this issue Mar 25, 2021 · 1 comment · Fixed by #167

Comments

@FrankyGT
Copy link

SUMMARY

Please implement an option to disable the hardcoded rsync options '--delay-updates' and '-F' in the synchronize module

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

synchronize

ADDITIONAL INFORMATION

when using the synchronize module in ansible, I see options "--delay-updates" and "-F" are hardcoded in the module.
Both of these options severely slow down rsync when syncing NFS filesystems.
I hope you guys understand that we are talking about an enterprise infrastructure where we try to sync several TB's per day from europe to multiple locations in the US. All locations use netapp fileservers which host filesystems > 100 TB. We can't use netapp's snapmirror, as we only need to sync a small subset of this data.
So our setup has a linux client machine at the source location, with an NFS mount of the source data. This is then synced to the destination, which is also a linux machine with NFS mounts.
I already read an old ticket about the implications of the --delay-updates options, but not about -F. This is quite an intrusive option to use on NFS filesystems. Since the file never exists in our situation, it can't find it in any caches and must send a request to the netapp NFS server. So for every directory it encounters it has an additional network payload. From the netapp side something similar happens : since the file is actually read before the actual content of the directory, and the file does not exist, it can't serve it from netapp caches either and must do a disk read.

I've got an example trace :

104912 open("messenger/.rsync-filter", O_RDONLY) = -1 ENOENT (No such file or directory) <0.001064>
104912 openat(AT_FDCWD, "messenger", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000377>
104912 getdents(4, /* 6 entries /, 32768) = 184 <0.000509>
104912 lstat("messenger/icons", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0 <0.000030>
104912 lstat("messenger/addressbook", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0 <0.000028>
104912 lstat("messenger/messengercompose", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0 <0.000027>
104912 lstat("messenger/smime", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0 <0.000030>
104912 getdents(4, / 0 entries /, 32768) = 0 <0.000016>
104912 close(4) = 0 <0.000016>
104912 open("messenger/icons/.rsync-filter", O_RDONLY) = -1 ENOENT (No such file or directory) <0.001100>
104912 openat(AT_FDCWD, "messenger/icons", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000298>
104912 getdents(4, / 2 entries /, 32768) = 48 <0.000359>
104912 getdents(4, / 0 entries */, 32768) = 0 <0.000016>
104912 close(4) = 0 <0.000016>

So there are 2 directories in this example being scanned, one with 4 filesystem object and an empty one.

In the first one we have :

0.001064 seconds for checking the .rsync-filter file
0.000377 + 0.000509 = 0.000886 seconds for opening and reading the directory
0.000030 + 0.000027 + 0.000028 + 0.000030 = 0.000117 seconds for reading attributes of the filesystem objects
0.000016 + 0.000016 = 0.000032 seconds for closing the directory

So it actually takes longer to check for this file than to read a directory and all attributes of a few files.

Therefore I would request an option to turn off these hardcoded options. From our point there is no reason to provide extra consistency when using --delay-updates, as we actively check for errors in the transfers, and we don't use .rsync-filter files

Of course we already modified the module file ourselves, but it is a pain to maintain custom patches for 3rd party vendors. Hence this request

Also keep in mind that in enterprise environments, every delay counts and is magnified by scale. For example, we install about 100 systems / hour using ansible, we create 5000 VM's / hour on prem. So every second counts...

@quidame
Copy link
Contributor

quidame commented Mar 29, 2021

Hi @FrankyGT

The module documentation comes with a note about that:

The C(synchronize) module forces --delay-updates to avoid leaving a destination in a broken in-between state
if the underlying rsync process encounters an error. Those synchronizing large numbers of files that are willing to
trade safety for performance should call rsync directly.

But It doesn't say how to call rsync directly with some bits of idempotency and check_modesupport.

- name: rsync command example, with bits of idempotency and check_mode support
  command:
    cmd: >
      rsync {% if ansible_check_mode %}--dry-run{% endif %}
      --out-format="<<CHANGED>>%i %n%L"
      --archive
      --delete
      --one-file-system
      --exclude=/lost+found
      --delete-excluded
      {{ source }} {{ destination }}
  register: result
  changed_when: result.stdout is search('<<CHANGED>>')
  failed_when: result.rc not in [0, 24]   # rc 24 = ignore vanished files

That said, this is about rsync, and in many cases, leaving a destination in an in-between state doesn't mean it is in a broken state, since playing the same task again should end the job. So I agree that these forced options could be optional.

Also, reading the module code and the rsync manpage, it appears that the module doesn't really force --delay-updates, because it comes as the first option in the module's rsync commandline, and as such could be overridden with... --no-delay-updates in the module's rsync_opts parameter. If this could help...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants