Resource-intensive URL cache expiration bogs down Synapse #2638

spantaleev · 2017-11-04T11:55:51Z

The "expire URL cache" feature introduced in #2478 causes Synapse to become unresponsive while the cleanup operation happens.

I've been running a homeserver on an older Synapse version (pre-#2478) and naturally had a long list of entries (thousands) that were due for expiration.

As soon as I upgraded to a newer Synapse version (post-#2478) the cleanup feature kicked in.

It seems that while PreviewUrlResource._expire_url_cache_data is doing its job, Synapse is unable not do anything else (handling API requests, federation, etc.)

Since the media-store is on a remote filesystem for me (using s3fs-fuse), deleting a file takes a long time.
Doing that for thousands of files one after the other, naturally, takes a very long time.
Synapse cannot do anything else during that time.

I understand that normally such excessive delete operations would not happen. It only happened this time, because cleanup had never happened before (because it wasn't implemented) and there were a lot of things queued up.

Still, people running large homeservers with a media store powered by a slow filesystem (NFS? etc.) would probably also be affected by this to some extent - even during "incremental" cleanup (although doing it once every 10 seconds probably keeps the batches small).

In any case, it seems inappropriate for some maintenance operation to bog down the entire server and prevent it from doing its core responsibility - handling API requests and federation.

The text was updated successfully, but these errors were encountered:

richvdh · 2017-11-04T12:25:23Z

yes, it looks like we're doing the FS operations in the main reactor thread rather than delegating them to a separate thread as we should. On a local FS that will be unnoticeable but for a remote FS it will stop synapse responding for significant periods :/

hex-m · 2021-05-26T14:31:54Z

For us Synapse stopped working because the NFS-mounted media_store_path didn't respond because of a problem on the NFS-Server side. Maybe it have been better to use a soft mount for that path? Then it would at least have timed-out.

I didn't create a separate issue because the underlying issue is that core functionality shouldn't be blocked by I/O operations.

This comment has been minimized.

Sign in to view

erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jul 26, 2021

matrixbot mentioned this issue Dec 21, 2023

Resource-intensive URL cache expiration bogs down Synapse element-hq/synapse#2638

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource-intensive URL cache expiration bogs down Synapse #2638

Resource-intensive URL cache expiration bogs down Synapse #2638

spantaleev commented Nov 4, 2017

richvdh commented Nov 4, 2017

This comment has been minimized.

This comment has been minimized.

hex-m commented May 26, 2021

Resource-intensive URL cache expiration bogs down Synapse #2638

Resource-intensive URL cache expiration bogs down Synapse #2638

Comments

spantaleev commented Nov 4, 2017

richvdh commented Nov 4, 2017

This comment has been minimized.

This comment has been minimized.

hex-m commented May 26, 2021