You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
The "expire URL cache" feature introduced in #2478 causes Synapse to become unresponsive while the cleanup operation happens.
I've been running a homeserver on an older Synapse version (pre-#2478) and naturally had a long list of entries (thousands) that were due for expiration.
As soon as I upgraded to a newer Synapse version (post-#2478) the cleanup feature kicked in.
It seems that while PreviewUrlResource._expire_url_cache_data is doing its job, Synapse is unable not do anything else (handling API requests, federation, etc.)
Since the media-store is on a remote filesystem for me (using s3fs-fuse), deleting a file takes a long time.
Doing that for thousands of files one after the other, naturally, takes a very long time.
Synapse cannot do anything else during that time.
I understand that normally such excessive delete operations would not happen. It only happened this time, because cleanup had never happened before (because it wasn't implemented) and there were a lot of things queued up.
Still, people running large homeservers with a media store powered by a slow filesystem (NFS? etc.) would probably also be affected by this to some extent - even during "incremental" cleanup (although doing it once every 10 seconds probably keeps the batches small).
In any case, it seems inappropriate for some maintenance operation to bog down the entire server and prevent it from doing its core responsibility - handling API requests and federation.
The text was updated successfully, but these errors were encountered:
yes, it looks like we're doing the FS operations in the main reactor thread rather than delegating them to a separate thread as we should. On a local FS that will be unnoticeable but for a remote FS it will stop synapse responding for significant periods :/
For us Synapse stopped working because the NFS-mounted media_store_path didn't respond because of a problem on the NFS-Server side. Maybe it have been better to use a soft mount for that path? Then it would at least have timed-out.
I didn't create a separate issue because the underlying issue is that core functionality shouldn't be blocked by I/O operations.
The "expire URL cache" feature introduced in #2478 causes Synapse to become unresponsive while the cleanup operation happens.
I've been running a homeserver on an older Synapse version (pre-#2478) and naturally had a long list of entries (thousands) that were due for expiration.
As soon as I upgraded to a newer Synapse version (post-#2478) the cleanup feature kicked in.
It seems that while
PreviewUrlResource._expire_url_cache_data
is doing its job, Synapse is unable not do anything else (handling API requests, federation, etc.)Since the
media-store
is on a remote filesystem for me (using s3fs-fuse), deleting a file takes a long time.Doing that for thousands of files one after the other, naturally, takes a very long time.
Synapse cannot do anything else during that time.
I understand that normally such excessive delete operations would not happen. It only happened this time, because cleanup had never happened before (because it wasn't implemented) and there were a lot of things queued up.
Still, people running large homeservers with a media store powered by a slow filesystem (NFS? etc.) would probably also be affected by this to some extent - even during "incremental" cleanup (although doing it once every 10 seconds probably keeps the batches small).
In any case, it seems inappropriate for some maintenance operation to bog down the entire server and prevent it from doing its core responsibility - handling API requests and federation.
The text was updated successfully, but these errors were encountered: