Sliding sync: The Two Sync Loops #1928

poljar · 2023-05-16T11:02:59Z

Push notifications

Push notifications are used to deliver information about messages relevant to the user in a timely manner as a OS notification.

The mobile device receives a room ID and event ID from the push service. The Client on the device then determines whether to convert this data into a visible OS notification.

Since the room ID and event ID are not enough to display a notification, more data needs to be fetched from the homeserver before doing so. To display all the relevant data we'll need to sync with the server. Push notifications on Apple devices are handled by a separate, resource limited, process. From now on called the "Notification process". It cannot wake up the main process and must independently retrieve any additional data.

In cases where an encrypted event triggers the notification, the presence of room keys may be necessary for event decryption. Room keys are delivered as to-device events to each recipient device.

To fulfill this requirement, the Notification process must initiate a /sync request. This request may fetch more data, including the room display name calculated by the SS proxy, the member event of the sender, and the event itself.

Client setup for multiple process sync support

At present, there is no support for multiple processes making changes to the Client database simultaneously. The following flowchart models modifications to our Client and its sync loop setup which aim to allow this.

flowchart TD
    homeserver[Homeserver]

    subgraph client[Main process]
     direction LR
     main[Main sync]
     to-device[Extension sync]
    end

    nse[Notification process]

    main <==> homeserver
    to-device <==> homeserver
    nse <==> homeserver

The main process will split out its sync loop into two parts. The main part, which is responsible to sync room data and events the client wants to display right away, and the extensions part (or only to-device) part, where we mainly sync to-device events.

This solves two problems:

The main sync, getting reset by changing sync settings because of user interaction, is not blocking to-device and room key delivery anymore.
If another process is doing the extensions sync, the main sync does not need to wait for the process to be done.
Prevent the state where two sync loops are fighting for to-device events, since those are supposed to be delivered only once.

Main process client flow

The following flow chart models how the main process should behave when setting up a sync loop.

flowchart TD
    A[Main process] --> B(Start the sync streams)
    B --> C(Main Sync stream)
    C --> E(Sync once)
    E --> E
    B --> F(Extension sync stream)
    F --> G{Acquire the extensions sync lock}
    G -->|fa:fa-lock-open Lock acquired| H(Sync once - extensions)
    G -->|fa:fa-lock Can't acquire lock| I(Wait)
    I --> F
    H --> H

It would probably be nice, or even required, to stop the sync to release the "Extensions sync lock" when the main process gets suspended by the OS.

Notification client flow

The notification client on the other hand should behave a bit differently. It mainly should not set up a long running sync loop. It should sync only to fetch the required data which is necessary to display the notification.

flowchart TD
    A[Notification process] --> B(Received a push)
    B --> C(Start up the Client)
    C --> D{Acquire the extensions sync lock}
    D -->|fa:fa-lock-open Lock acquired| E(Sync once with to-device enabled)
    D -->|fa:fa-lock Can't acquire lock| F(Sync once with to-device disabled)

Tasks

Related but not mandatory for this

The text was updated successfully, but these errors were encountered:

This implements a value-based lock in the crypto stores. The intent is to use that for multiple processes to be able to make writes into the store concurrently, while still cooperating on who does them. In particular, we need this for #1928, since we may have up to two different processes trying to write into the crypto store at the same time. ## New methods in the `CryptoStore` trait The idea is to introduce two new methods touching **custom values** in the crypto store: - one to atomically insert a value, only if it was missing (so, not following the semantics of `upsert` used in the `set_custom_value`) - one to atomically remove a custom value Those two operations match the semantics we want: - take the lock only if it ain't taken already == insert an entry only if it was missing - release the lock = remove the entry By looking at the number of lines affected by the query, we can infer whether the insert/remove happened or not, that is, if we managed to take the lock or not. ## High-level APIs I've also added an high-level API, `CryptoStoreLock`, that helps managing such a lock, and adds some niceties on top of that: - exponential backoff to retry attempts at acquiring the lock, when it was already taken - attempt to gracefully recover when the lock has been taken by an app that's been killed by the environment - full configuration of the key / value / backoff parameters While it'd be nice to have something like a `CryptoStoreLockGuard`, it's hard to implement without being racy, because of the `async` statements that would happen in the `Drop` method (and async drop isn't stable yet). ## Test program There's also a test program in which I shamelessly show my rudimentary unix skills; I've put it in the `labs/` directory but this could as well be a large integration test. A parent program initially fills a custom crypto store, then creates a `pipe()` for 1-way communication with a child created with `fork()`; then the parent sends commands to the child. These commands consist in reading and writing into the crypto store, using a lock. And while the child attempts to perform these operations, the parent tries hard to get the lock at the same time. This helps figuring out a few issues and making sure that cross-process locking would work as intended.

This implements a new time lease based lock for the `CryptoStore`, that doesn't require explicit unlocking, so that's more robust in the context of #1928, where any process may die because the device is running out of battery, or unexpected flows cause a lock to not be released properly in one or the other process. ``` //! This is a per-process lock that may be used only for very specific use //! cases, where multiple processes might concurrently write to the same //! database at the same time; this would invalidate crypto store caches, so //! that should be done mindfully. Such a lock can be acquired multiple times by //! the same process, and it remains active as long as there's at least one user //! in a given process. //! //! The lock is implemented using time-based leases to values inserted in a //! crypto store. The store maintains the lock identifier (key), who's the //! current holder (value), and an expiration timestamp on the side; see also //! `CryptoStore::try_take_leased_lock` for more details. //! //! The lock is initially acquired for a certain period of time (namely, the //! duration of a lease, aka `LEASE_DURATION_MS`), and then a "heartbeat" task //! renews the lease to extend its duration, every so often (namely, every //! `EXTEND_LEASE_EVERY_MS`). Since the tokio scheduler might be busy, the //! extension request should happen way more frequently than the duration of a //! lease, in case a deadline is missed. The current values have been chosen to //! reflect that, with a ratio of 1:10 as of 2023-06-23. //! //! Releasing the lock happens naturally, by not renewing a lease. It happens //! automatically after the duration of the last lease, at most. ``` --- * feat: implement a time lease based lock for the crypto store * feat: switch the crypto-store lock a time-leased based one * chore: fix CI, don't use unixepoch in sqlite and do time math in rust * chore: dummy implementation in indexeddb, don't run lease locks tests there * feat: in NSE, wait the duration of a lease if first attempt to unlock failed * feat: immediately release the lock when there are no more holders * chore: clippy * chore: add comment about atomic sanity * chore: increase sleeps in timeline queue tests? * feat: lower lease and renew durations * feat: keep track of the extend-lease task * fix: increment num_holders when acquiring the lock for the first time * chore: reduce indent + abort prev renew task on non-wasm + add logs

This was referenced May 23, 2023

Split out the sliding sync loop into two loops #1961

Closed

Create a "Notification service" specific client which supports only a subset of the whole sync logic #1962

Closed

manuroe mentioned this issue May 24, 2023

[Story] Receive push notification for new message element-hq/element-meta#1093

Closed

3 tasks

bnjbvr mentioned this issue May 29, 2023

feat: Sliding Sync Once for keys #1886

Closed

bnjbvr self-assigned this May 29, 2023

This was referenced Jun 2, 2023

sliding sync: Always infer the storage key from a sliding sync loop identifier #2007

Closed

sliding sync: infer the storage key from the loop id and user id #2008

Merged

bnjbvr mentioned this issue Jun 12, 2023

Add a value-based lock in the CryptoStores #2049

Merged

Velin92 mentioned this issue Jun 14, 2023

Two sync loops for push notifications element-hq/element-x-ios#1083

Closed

bnjbvr mentioned this issue Jun 15, 2023

feat: make OlmMachine resettable #2091

Merged

manuroe mentioned this issue Jun 19, 2023

[Story] Receive push notifications for room invites element-hq/element-meta#1094

Closed

6 tasks

poljar mentioned this issue Jun 20, 2023

feat: implement the full EncryptionSync #2110

Merged

Hywan mentioned this issue Jun 21, 2023

App now takes ~10s to incr sync element-hq/element-x-ios#970

Closed

bnjbvr mentioned this issue Jun 23, 2023

feat: implement time lease based locks for the CryptoStore #2140

Merged

bnjbvr mentioned this issue Jul 3, 2023

feat: Implement App API #2204

Merged

bnjbvr closed this as completed in #2204 Jul 11, 2023

poljar mentioned this issue Jul 18, 2023

[WIP] MSC3814: Dehydrated devices with SSSS matrix-org/matrix-spec-proposals#3814

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sliding sync: The Two Sync Loops #1928

Sliding sync: The Two Sync Loops #1928

poljar commented May 16, 2023 •

edited by bnjbvr

Loading

Sliding sync: The Two Sync Loops #1928

Sliding sync: The Two Sync Loops #1928

Comments

poljar commented May 16, 2023 • edited by bnjbvr Loading

Push notifications

Client setup for multiple process sync support

Main process client flow

Notification client flow

Tasks

Related but not mandatory for this

poljar commented May 16, 2023 •

edited by bnjbvr

Loading