Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

key query causes device list updates over /sync in a loop #9613

Closed
deepbluev7 opened this issue Mar 13, 2021 · 9 comments
Closed

key query causes device list updates over /sync in a loop #9613

deepbluev7 opened this issue Mar 13, 2021 · 9 comments
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@deepbluev7
Copy link
Contributor

deepbluev7 commented Mar 13, 2021

Preface: This may not actually be a synapse bug, but a client bug or a construct bug, but I'm taking a guess, that this is an issue in synapse.

Description

So, I have been experiencing a high load of master key queries on my database:

grafik

This seems to have come from the following loop:

  1. Synapse was sending me a device list update for @jason:test.zemos.net
  2. Nheko calls /_matrix/client/r0/keys/query
  3. Synapse calls GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
  4. Synapse gets a 200 response to that query
  5. Synapse sends the result to the client
  6. Synapse sends a device list update to the client, causing the client to go to 2.

Steps to reproduce

I don't think there is a good way to reproduce it? I broke the loop now by suppressing the device key queries once, so maybe it can be restarted by sending another query.

I think synapse should not send a device list update to the client here, if the remote query doesn't return a different result.

Logs:

Note: these are just from the master process, but the federation reader shows no relevant incoming transactions.

2021-03-13 22:12:59,057 - synapse.http.matrixfederationclient - 209 - INFO - POST-4932 - {GET-O-2827} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:12:59,059 - synapse.http.matrixfederationclient - 209 - INFO - POST-4934 - {GET-O-2826} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:12:59,061 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2330 - Streaming: caches -> 1658402
2021-03-13 22:12:59,063 - synapse.access.http.8008 - 308 - INFO - POST-4932 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.318sec/-0.000sec (0.002sec, 0.000sec) (0.001sec/1.144sec/7) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:12:59,064 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2331 - Streaming: device_lists -> 10284031
2021-03-13 22:12:59,068 - synapse.access.http.8008 - 308 - INFO - POST-4934 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.158sec/-0.000sec (0.004sec, 0.000sec) (0.001sec/0.006sec/6) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:12:59,068 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2331 - Streaming: caches -> 1658403
2021-03-13 22:12:59,069 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2331 - Streaming: device_lists -> 10284034
2021-03-13 22:12:59,079 - synapse.access.http.9893 - 308 - INFO - POST-4941 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40echo%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,086 - synapse.access.http.9893 - 308 - INFO - POST-4942 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40echo%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,115 - synapse.replication.http.federation - 170 - INFO - POST-4943 - Got 'm.presence' edu from litli.eu
2021-03-13 22:12:59,117 - synapse.access.http.9893 - 308 - INFO - POST-4943 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.001sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/gjbVFOYuHo HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,118 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2332 - Streaming: presence -> 101607579
2021-03-13 22:12:59,240 - synapse.access.http.9893 - 308 - INFO - POST-4945 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,324 - synapse.replication.http.federation - 170 - INFO - POST-4946 - Got 'm.presence' edu from kumi.lgbt
2021-03-13 22:12:59,328 - synapse.access.http.9893 - 308 - INFO - POST-4946 - 127.0.0.1 - 9893 - {None} Processed request: 0.005sec/-0.000sec (0.002sec, 0.000sec) (0.001sec/0.001sec/3) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/BNCrKakmoo HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,329 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2333 - Streaming: presence -> 101607580
2021-03-13 22:12:59,334 - synapse.replication.http.federation - 170 - INFO - POST-4947 - Got 'm.presence' edu from tvall43.ddns.net
2021-03-13 22:12:59,336 - synapse.access.http.9893 - 308 - INFO - POST-4947 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.001sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/aibIyeXAdj HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,337 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2334 - Streaming: presence -> 101607581
2021-03-13 22:12:59,406 - synapse.access.http.9893 - 308 - INFO - POST-4949 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,435 - synapse.replication.http.federation - 170 - INFO - POST-4950 - Got 'm.presence' edu from matrix.teckids.org
2021-03-13 22:12:59,438 - synapse.access.http.9893 - 308 - INFO - POST-4950 - 127.0.0.1 - 9893 - {None} Processed request: 0.002sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/NnQdyBVktT HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,439 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2335 - Streaming: presence -> 101607582
2021-03-13 22:12:59,569 - synapse.access.http.9893 - 308 - INFO - POST-4951 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,688 - synapse.replication.http.federation - 170 - INFO - POST-4952 - Got 'm.presence' edu from b3web.de
2021-03-13 22:12:59,690 - synapse.access.http.9893 - 308 - INFO - POST-4952 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.001sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/TxDHfRQBMk HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,691 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2336 - Streaming: presence -> 101607583
2021-03-13 22:12:59,722 - synapse.access.http.9893 - 308 - INFO - POST-4953 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:12:59,877 - synapse.access.http.9893 - 308 - INFO - POST-4954 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,520 - synapse.http.matrixfederationclient - 209 - INFO - POST-4944 - {GET-O-2829} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:13:00,523 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2337 - Streaming: caches -> 1658404
2021-03-13 22:13:00,525 - synapse.access.http.8008 - 308 - INFO - POST-4944 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.298sec/-0.000sec (0.002sec, 0.001sec) (0.001sec/1.123sec/7) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:13:00,526 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2338 - Streaming: device_lists -> 10284037
2021-03-13 22:13:00,528 - synapse.http.matrixfederationclient - 209 - INFO - POST-4948 - {GET-O-2828} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:13:00,532 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2339 - Streaming: caches -> 1658405
2021-03-13 22:13:00,534 - synapse.access.http.8008 - 308 - INFO - POST-4948 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.136sec/-0.000sec (0.003sec, 0.000sec) (0.001sec/0.003sec/6) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:13:00,535 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2340 - Streaming: device_lists -> 10284040
2021-03-13 22:13:00,540 - synapse.access.http.9893 - 308 - INFO - POST-4955 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40echo%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,546 - synapse.access.http.9893 - 308 - INFO - POST-4956 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40echo%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,597 - synapse.replication.http.federation - 170 - INFO - POST-4957 - Got 'm.presence' edu from ksol.io
2021-03-13 22:13:00,601 - synapse.access.http.9893 - 308 - INFO - POST-4957 - 127.0.0.1 - 9893 - {None} Processed request: 0.004sec/-0.000sec (0.002sec, 0.000sec) (0.001sec/0.001sec/3) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/qTSFTIdRwy HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,602 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2341 - Streaming: presence -> 101607584
2021-03-13 22:13:00,629 - synapse.replication.http.federation - 170 - INFO - POST-4958 - Got 'm.presence' edu from b3web.de
2021-03-13 22:13:00,630 - synapse.access.http.9893 - 308 - INFO - POST-4958 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/aiRbTDLfpU HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,649 - synapse.replication.http.federation - 170 - INFO - POST-4959 - Got 'm.presence' edu from matrix.teckids.org
2021-03-13 22:13:00,650 - synapse.access.http.9893 - 308 - INFO - POST-4959 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/ycpqSIuMHp HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,689 - synapse.access.http.9893 - 308 - INFO - POST-4961 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,814 - synapse.replication.http.federation - 170 - INFO - POST-4962 - Got 'm.presence' edu from itsg.host
2021-03-13 22:13:00,817 - synapse.access.http.9893 - 308 - INFO - POST-4962 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.001sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/fkYlZyeZgb HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,818 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2342 - Streaming: presence -> 101607585
2021-03-13 22:13:00,855 - synapse.access.http.9893 - 308 - INFO - POST-4964 - 127.0.0.1 - 9893 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:00,875 - synapse.access.http.9893 - 308 - INFO - POST-4965 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40abuse-management%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,021 - synapse.access.http.9893 - 308 - INFO - POST-4966 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,339 - synapse.replication.http.federation - 170 - INFO - POST-4967 - Got 'm.presence' edu from cadair.com
2021-03-13 22:13:01,342 - synapse.access.http.9893 - 308 - INFO - POST-4967 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.001sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/kOcfQIZNFX HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,343 - synapse.metrics - 577 - INFO - None - Collecting gc 1
2021-03-13 22:13:01,344 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2343 - Streaming: presence -> 101607586
2021-03-13 22:13:01,348 - synapse.access.http.8008 - 308 - INFO - GET-4968 - 79.219.49.18 - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 353B 200 "GET /_matrix/client/versions HTTP/
1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:13:01,438 - synapse.replication.http.federation - 170 - INFO - POST-4969 - Got 'm.presence' edu from cadair.com
2021-03-13 22:13:01,441 - synapse.access.http.9893 - 308 - INFO - POST-4969 - 127.0.0.1 - 9893 - {None} Processed request: 0.003sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/1) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/EPEyeNUlMy HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,442 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2344 - Streaming: presence -> 101607587
2021-03-13 22:13:01,498 - synapse.access.http.9893 - 308 - INFO - POST-4970 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,656 - synapse.access.http.9893 - 308 - INFO - POST-4971 - 127.0.0.1 - 9893 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/replication/presence_se
t_state/%40deepbluev7%3Aneko.dev HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,864 - synapse.replication.http.federation - 170 - INFO - POST-4972 - Got 'm.presence' edu from cheddar.cheeseandkrakens.com
2021-03-13 22:13:01,868 - synapse.access.http.9893 - 308 - INFO - POST-4972 - 127.0.0.1 - 9893 - {None} Processed request: 0.004sec/-0.000sec (0.002sec, 0.000sec) (0.000sec/0.001sec/3) 4B 200 "POST /_synapse/replication/fed_send_ed
u/m.presence/jHGQbuXhoH HTTP/1.1" "Synapse/1.29.0" [0 dbevts]
2021-03-13 22:13:01,869 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2345 - Streaming: presence -> 101607588
2021-03-13 22:13:01,987 - synapse.http.matrixfederationclient - 209 - INFO - POST-4960 - {GET-O-2831} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:13:01,990 - synapse.http.matrixfederationclient - 209 - INFO - POST-4963 - {GET-O-2830} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason
%3Atest.zemos.net
2021-03-13 22:13:01,992 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2346 - Streaming: caches -> 1658406
2021-03-13 22:13:01,994 - synapse.access.http.8008 - 308 - INFO - POST-4960 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.317sec/-0.000sec (0.004sec, 0.001sec) (0.002sec/1.143sec/7) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:13:01,995 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2347 - Streaming: device_lists -> 10284043
2021-03-13 22:13:01,999 - synapse.access.http.8008 - 308 - INFO - POST-4963 - 79.219.49.18 - 8008 - {@deepbluev7:neko.dev} Processed request: 1.148sec/-0.000sec (0.003sec, 0.000sec) (0.001sec/0.006sec/6) 1514B 200 "POST /_matrix/cl
ient/r0/keys/query HTTP/1.0" "mtxclient v0.4.1" [0 dbevts]
2021-03-13 22:13:01,999 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2347 - Streaming: caches -> 1658407
2021-03-13 22:13:02,001 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-2347 - Streaming: device_lists -> 10284046

filtered by zemos:

2021-03-13 22:12:39,409 - synapse.http.matrixfederationclient - 209 - INFO - POST-4749 - {GET-O-2690} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:40,901 - synapse.http.matrixfederationclient - 209 - INFO - POST-4763 - {GET-O-2691} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:40,911 - synapse.http.matrixfederationclient - 209 - INFO - POST-4766 - {GET-O-2692} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:42,456 - synapse.http.matrixfederationclient - 209 - INFO - POST-4784 - {GET-O-2693} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:43,963 - synapse.http.matrixfederationclient - 209 - INFO - POST-4795 - {GET-O-2809} [test.zemos.net] Completed request: 200 OK in 0.21 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:43,964 - synapse.http.matrixfederationclient - 209 - INFO - POST-4796 - {GET-O-2810} [test.zemos.net] Completed request: 200 OK in 0.21 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:45,435 - synapse.http.matrixfederationclient - 209 - INFO - POST-4805 - {GET-O-2812} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:45,437 - synapse.http.matrixfederationclient - 209 - INFO - POST-4808 - {GET-O-2811} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:46,895 - synapse.http.matrixfederationclient - 209 - INFO - POST-4817 - {GET-O-2814} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:46,903 - synapse.http.matrixfederationclient - 209 - INFO - POST-4814 - {GET-O-2813} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:48,455 - synapse.http.matrixfederationclient - 209 - INFO - POST-4829 - {GET-O-2815} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:50,063 - synapse.http.matrixfederationclient - 209 - INFO - POST-4843 - {GET-O-2816} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:51,684 - synapse.http.matrixfederationclient - 209 - INFO - POST-4855 - {GET-O-2817} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:53,131 - synapse.http.matrixfederationclient - 209 - INFO - POST-4866 - {GET-O-2819} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:53,133 - synapse.http.matrixfederationclient - 209 - INFO - POST-4870 - {GET-O-2818} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:54,586 - synapse.http.matrixfederationclient - 209 - INFO - POST-4884 - {GET-O-2821} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:54,594 - synapse.http.matrixfederationclient - 209 - INFO - POST-4880 - {GET-O-2820} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:56,107 - synapse.http.matrixfederationclient - 209 - INFO - POST-4905 - {GET-O-2823} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:56,109 - synapse.http.matrixfederationclient - 209 - INFO - POST-4902 - {GET-O-2822} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:57,576 - synapse.http.matrixfederationclient - 209 - INFO - POST-4919 - {GET-O-2825} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:57,585 - synapse.http.matrixfederationclient - 209 - INFO - POST-4921 - {GET-O-2824} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:59,057 - synapse.http.matrixfederationclient - 209 - INFO - POST-4932 - {GET-O-2827} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:12:59,059 - synapse.http.matrixfederationclient - 209 - INFO - POST-4934 - {GET-O-2826} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:00,520 - synapse.http.matrixfederationclient - 209 - INFO - POST-4944 - {GET-O-2829} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:00,528 - synapse.http.matrixfederationclient - 209 - INFO - POST-4948 - {GET-O-2828} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:01,987 - synapse.http.matrixfederationclient - 209 - INFO - POST-4960 - {GET-O-2831} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:01,990 - synapse.http.matrixfederationclient - 209 - INFO - POST-4963 - {GET-O-2830} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:03,497 - synapse.http.matrixfederationclient - 209 - INFO - POST-4976 - {GET-O-2832} [test.zemos.net] Completed request: 200 OK in 0.18 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net
2021-03-13 22:13:04,987 - synapse.http.matrixfederationclient - 209 - INFO - POST-4987 - {GET-O-2833} [test.zemos.net] Completed request: 200 OK in 0.17 secs - GET matrix://test.zemos.net/_matrix/federation/v1/user/devices/%40jason%3Atest.zemos.net

Version information

  • Homeserver:

If not matrix.org: neko.dev

  • Version: 1.29.0

  • Install method: custom ebuild

  • Platform: Gentoo
@anoadragon453
Copy link
Member

https://spec.matrix.org/unstable/client-server-api/#tracking-the-device-list-for-a-user gives some useful information as to how this should play out. Indeed I could see a loop occurring if Synapse keeps adding @jason:test.zemos.net to the device_lists.changed field of /sync.

Do you see multiple instances of the same device being updated in the device_lists_stream table of your Synapse DB while the loop plays out?

In user_device_resync, it doesn't look like Synapse checks to see if the newly queries devices actually differ from the current set. It will simply add them to the stream, which gives the devices a new stream ID, and thus /sync thinks a change has occurred and inform the client.

result = await self.federation.query_user_devices(origin, user_id)
except NotRetryingDestination:
if mark_failed_as_stale:
# Mark the remote user's device list as stale so we know we need to retry
# it later.
await self.store.mark_remote_user_device_cache_as_stale(user_id)
return None
except (RequestSendFailed, HttpResponseException) as e:
logger.warning(
"Failed to handle device list update for %s: %s",
user_id,
e,
)
if mark_failed_as_stale:
# Mark the remote user's device list as stale so we know we need to retry
# it later.
await self.store.mark_remote_user_device_cache_as_stale(user_id)
# We abort on exceptions rather than accepting the update
# as otherwise synapse will 'forget' that its device list
# is out of date. If we bail then we will retry the resync
# next time we get a device list update for this user_id.
# This makes it more likely that the device lists will
# eventually become consistent.
return None
except FederationDeniedError as e:
set_tag("error", True)
log_kv({"reason": "FederationDeniedError"})
logger.info(e)
return None
except Exception as e:
set_tag("error", True)
log_kv(
{"message": "Exception raised by federation request", "exception": e}
)
logger.exception("Failed to handle device list update for %s", user_id)
if mark_failed_as_stale:
# Mark the remote user's device list as stale so we know we need to retry
# it later.
await self.store.mark_remote_user_device_cache_as_stale(user_id)
return None
log_kv({"result": result})
stream_id = result["stream_id"]
devices = result["devices"]
# Get the master key and the self-signing key for this user if provided in the
# response (None if not in the response).
# The response will not contain the user signing key, as this key is only used by
# its owner, thus it doesn't make sense to send it over federation.
master_key = result.get("master_key")
self_signing_key = result.get("self_signing_key")
# If the remote server has more than ~1000 devices for this user
# we assume that something is going horribly wrong (e.g. a bot
# that logs in and creates a new device every time it tries to
# send a message). Maintaining lots of devices per user in the
# cache can cause serious performance issues as if this request
# takes more than 60s to complete, internal replication from the
# inbound federation worker to the synapse master may time out
# causing the inbound federation to fail and causing the remote
# server to retry, causing a DoS. So in this scenario we give
# up on storing the total list of devices and only handle the
# delta instead.
if len(devices) > 1000:
logger.warning(
"Ignoring device list snapshot for %s as it has >1K devs (%d)",
user_id,
len(devices),
)
devices = []
for device in devices:
logger.debug(
"Handling resync update %r/%r, ID: %r",
user_id,
device["device_id"],
stream_id,
)
await self.store.update_remote_device_list_cache(user_id, devices, stream_id)
device_ids = [device["device_id"] for device in devices]
# Handle cross-signing keys.
cross_signing_device_ids = await self.process_cross_signing_key_update(
user_id,
master_key,
self_signing_key,
)
device_ids = device_ids + cross_signing_device_ids
await self.device_handler.notify_device_update(user_id, device_ids)

I'm not sure how the Element clients are currently handling this without hitting the same failure mode.

@deepbluev7
Copy link
Contributor Author

deepbluev7 commented Mar 17, 2021

I'm not sure, how I would check, if multiple instances of the same device get updated, do you have some hints there?

I have the following output for that user:

synapse=# select * from device_lists_stream where user_id = '@jason:test.zemos.net';
 stream_id |        user_id        |                  device_id
-----------+-----------------------+---------------------------------------------
   7436449 | @jason:test.zemos.net | %0iA0QIgE6Z:test.zemos.net
   7436463 | @jason:test.zemos.net | %5ejUoTu4gQ:test.zemos.net
  11026862 | @jason:test.zemos.net | %E3xpvhR2sf:test.zemos.net
  11026863 | @jason:test.zemos.net | %lRBjNCVCVK:test.zemos.net
  11026864 | @jason:test.zemos.net | Rvp7iayKVdG/7yo6sFITYqLVTpH35yrmxEEUp0+XRNA
   7297624 | @jason:test.zemos.net | %HD8OlH6w0h:test.zemos.net
   7401251 | @jason:test.zemos.net | %dDeR5tTUkw:test.zemos.net
(7 rows)

The high stream ids seem to get updated at the same time and are increasing rapidly. Maybe you can give me some guidance on how to query the data you need?

@anoadragon453
Copy link
Member

Actually that tells me what I need to know, thanks 🙂 So the stream IDs are increasing with every update, which causes /sync to see that the device list stream ID in the sync token is out of date, and then informs the client that there are new updates.

I'm currently waiting to hear back from the client devs to see if they know something I don't. Out of curiosity though, do you experience the same looping behaviour if you request the keys of a remote Synapse user?

@anoadragon453 anoadragon453 added the X-Needs-Info This issue is blocked awaiting information from the reporter label Mar 17, 2021
@deepbluev7
Copy link
Contributor Author

No, it is actually just that construct user, so should I just file a construct issue in that case? Because this suggests the construct server is responding with invalid data to the query, right?

@anoadragon453
Copy link
Member

Possibly, however Synapse does indeed look to be updating devices unnecessarily. Funnily enough we just had a PR that fixed that: #9634 It focuses on cross-signing keys, but includes general device list updates as well.

If you're able to, you may want to attempt with that code merged and see if the problem persists.

@deepbluev7
Copy link
Contributor Author

Can I just apply that PR as a patch cleanly on the latest RC and not expect things to break, even if I remove it again? Because then it would be very easy to test for me.

@deepbluev7
Copy link
Contributor Author

Seems to be fixed by #9634. I'm guessing that will be in 1.31 then? Or is that picked into 1.30?

@anoadragon453
Copy link
Member

@deepbluev7 It will land in v1.31.0 yeah. I'll write here if that changes 🙂

Thanks for reporting regardless. Let us know if you get anything back from the Conduit side 🙂

@deepbluev7
Copy link
Contributor Author

Okay, so feedback from the construct side: It seems like this was caused by partially disabling e2ee on the construct side or some other temporary e2ee experiments, that then caused fallout after they were disabled. So there certainly was something happening on the construct side, but it is probably good, that synapse won't freak out over that anymore. Thank you both again for the fast response! I'm going to use the patch until it is included in a release for now. :3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

2 participants