Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Resource-intensive URL cache expiration bogs down Synapse #2638

Open
spantaleev opened this issue Nov 4, 2017 · 4 comments
Open

Resource-intensive URL cache expiration bogs down Synapse #2638

spantaleev opened this issue Nov 4, 2017 · 4 comments
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@spantaleev
Copy link
Contributor

The "expire URL cache" feature introduced in #2478 causes Synapse to become unresponsive while the cleanup operation happens.

I've been running a homeserver on an older Synapse version (pre-#2478) and naturally had a long list of entries (thousands) that were due for expiration.

As soon as I upgraded to a newer Synapse version (post-#2478) the cleanup feature kicked in.

It seems that while PreviewUrlResource._expire_url_cache_data is doing its job, Synapse is unable not do anything else (handling API requests, federation, etc.)

Since the media-store is on a remote filesystem for me (using s3fs-fuse), deleting a file takes a long time.
Doing that for thousands of files one after the other, naturally, takes a very long time.
Synapse cannot do anything else during that time.

I understand that normally such excessive delete operations would not happen. It only happened this time, because cleanup had never happened before (because it wasn't implemented) and there were a lot of things queued up.

Still, people running large homeservers with a media store powered by a slow filesystem (NFS? etc.) would probably also be affected by this to some extent - even during "incremental" cleanup (although doing it once every 10 seconds probably keeps the batches small).

In any case, it seems inappropriate for some maintenance operation to bog down the entire server and prevent it from doing its core responsibility - handling API requests and federation.

@richvdh
Copy link
Member

richvdh commented Nov 4, 2017

yes, it looks like we're doing the FS operations in the main reactor thread rather than delegating them to a separate thread as we should. On a local FS that will be unnoticeable but for a remote FS it will stop synapse responding for significant periods :/

@cybershaman

This comment has been minimized.

@richvdh

This comment has been minimized.

@hex-m
Copy link

hex-m commented May 26, 2021

For us Synapse stopped working because the NFS-mounted media_store_path didn't respond because of a problem on the NFS-Server side. Maybe it have been better to use a soft mount for that path? Then it would at least have timed-out.

I didn't create a separate issue because the underlying issue is that core functionality shouldn't be blocked by I/O operations.

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jul 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants